**Hana Chockler Georg Weissenbacher (Eds.)**

# LNCS 10981

# **Computer Aided Verification**

**30th International Conference, CAV 2018 Held as Part of the Federated Logic Conference, FloC 2018 Oxford, UK, July 14–17, 2018, Proceedings, Part I**

## Lecture Notes in Computer Science 10981

Commenced Publication in 1973 Founding and Former Series Editors: Gerhard Goos, Juris Hartmanis, and Jan van Leeuwen

#### Editorial Board

David Hutchison Lancaster University, Lancaster, UK Takeo Kanade Carnegie Mellon University, Pittsburgh, PA, USA Josef Kittler University of Surrey, Guildford, UK Jon M. Kleinberg Cornell University, Ithaca, NY, USA Friedemann Mattern ETH Zurich, Zurich, Switzerland John C. Mitchell Stanford University, Stanford, CA, USA Moni Naor Weizmann Institute of Science, Rehovot, Israel C. Pandu Rangan Indian Institute of Technology Madras, Chennai, India Bernhard Steffen TU Dortmund University, Dortmund, Germany Demetri Terzopoulos University of California, Los Angeles, CA, USA Doug Tygar University of California, Berkeley, CA, USA Gerhard Weikum Max Planck Institute for Informatics, Saarbrücken, Germany More information about this series at http://www.springer.com/series/7407

Hana Chockler • Georg Weissenbacher (Eds.)

# Computer Aided Verification

30th International Conference, CAV 2018 Held as Part of the Federated Logic Conference, FloC 2018 Oxford, UK, July 14–17, 2018 Proceedings, Part I

Editors Hana Chockler King's College London UK

Georg Weissenbacher TU Wien Vienna Austria

ISSN 0302-9743 ISSN 1611-3349 (electronic) Lecture Notes in Computer Science ISBN 978-3-319-96144-6 ISBN 978-3-319-96145-3 (eBook) https://doi.org/10.1007/978-3-319-96145-3

Library of Congress Control Number: 2018948145

LNCS Sublibrary: SL1 – Theoretical Computer Science and General Issues

© The Editor(s) (if applicable) and The Author(s) 2018. This book is an open access publication.

Open Access This book is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this book are included in the book's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the book's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use.

The publisher, the authors and the editors are safe to assume that the advice and information in this book are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or the editors give a warranty, express or implied, with respect to the material contained herein or for any errors or omissions that may have been made. The publisher remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

This Springer imprint is published by the registered company Springer Nature Switzerland AG The registered company address is: Gewerbestrasse 11, 6330 Cham, Switzerland

### Preface

It was our privilege to serve as the program chairs for CAV 2018, the 30th International Conference on Computer-Aided Verification. CAV is an annual conference dedicated to the advancement of the theory and practice of computer-aided formal analysis methods for hardware and software systems. CAV 2018 was held in Oxford, UK, July 14–17, 2018, with the tutorials day on July 13.

This year, CAV was held as part of the Federated Logic Conference (FLoC) event and was collocated with many other conferences in logic. The primary focus of CAV is to spur advances in hardware and software verification while expanding to new domains such as learning, autonomous systems, and computer security. CAV is at the cutting edge of research in formal methods, as reflected in this year's program.

CAV 2018 covered a wide spectrum of subjects, from theoretical results to concrete applications, including papers on application of formal methods in large-scale industrial settings. It has always been one of the primary interests of CAV to include papers that describe practical verification tools and solutions and techniques that ensure a high practical appeal of the results. The proceedings of the conference are published in Springer's Lecture Notes in Computer Science series. A selection of papers were invited to a special issue of Formal Methods in System Design and the Journal of the ACM.

This is the first year that the CAV proceedings are published under an Open Access license, thus giving access to CAV proceedings to a broad audience. We hope that this decision will increase the scope of practical applications of formal methods and will attract even more interest from industry.

CAV received a very high number of submissions this year—215 overall—resulting in a highly competitive selection process. We accepted 13 tool papers and 52 regular papers, which amounts to an acceptance rate of roughly 30% (for both regular papers and tool papers). The high number of excellent submissions in combination with the scheduling constraints of FLoC forced us to reduce the length of the talks to 15 minutes, giving equal exposure and weight to regular papers and tool papers.

The accepted papers cover a wide range of topics and techniques, from algorithmic and logical foundations of verification to practical applications in distributed, networked, cyber-physical, and autonomous systems. Other notable topics are synthesis, learning, security, and concurrency in the context of formal methods. The proceedings are organized according to the sessions in the conference.

The program featured two invited talks by Eran Yahav (Technion), on using deep learning for programming, and by Somesh Jha (University of Wisconsin Madison) on adversarial deep learning. The invited talks this year reflect the growing interest of the CAV community in deep learning and its connection to formal methods. The tutorial day of CAV featured two invited tutorials, by Shaz Qadeer on verification of concurrent programs and by Matteo Maffei on static analysis of smart contracts. The subjects of the tutorials reflect the increasing volume of research on verification of concurrent software and, as of recently, the question of correctness of smart contracts. As every year, one of the winners of the CAV award also contributed a presentation. The tutorial day featured a workshop in memoriam of Mike Gordon, titled "Three Research Vignettes in Memory of Mike Gordon," organized by Tom Melham and jointly supported by CAV and ITP communities.

Moreover, we continued the tradition of organizing a LogicLounge. Initiated by the late Helmut Veith at the Vienna Summer of Logic 2014, the LogicLounge is a series of discussions on computer science topics targeting a general audience and has become a regular highlight at CAV. This year's LogicLounge took place at the Oxford Union and was on the topic of "Ethics and Morality of Robotics," moderated by Judy Wajcman and featuring a panel of experts on the topic: Luciano Floridi, Ben Kuipers, Francesca Rossi, Matthias Scheutz, Sandra Wachter, and Jeannette Wing. We thank May Chan, Katherine Fletcher, and Marta Kwiatkowska for organizing this event, and the Vienna Center of Logic and Algorithms for their support.

In addition, CAV attendees enjoyed a number of FLoC plenary talks and events targeting the broad FLoC community.

In addition to the main conference, CAV hosted the Verification Mentoring Workshop for junior scientists entering the field and a high number of pre- and post-conference technical workshops: the Workshop on Formal Reasoning in Distributed Algorithms (FRIDA), the workshop on Runtime Verification for Rigorous Systems Engineering (RV4RISE), the 5th Workshop on Horn Clauses for Verification and Synthesis (HCVS), the 7th Workshop on Synthesis (SYNT), the First International Workshop on Parallel Logical Reasoning (PLR), the 10th Working Conference on Verified Software: Theories, Tools and Experiments (VSTTE), the Workshop on Machine Learning for Programming (MLP), the 11th International Workshop on Numerical Software Verification (NSV), the Workshop on Verification of Engineered Molecular Devices and Programs (VEMDP), the Third Workshop on Fun With Formal Methods (FWFM), the Workshop on Robots, Morality, and Trust through the Verification Lens, and the IFAC Conference on Analysis and Design of Hybrid Systems (ADHS).

The Program Committee (PC) for CAV consisted of 80 members; we kept the number large to ensure each PC member would have a reasonable number of papers to review and be able to provide thorough reviews. As the review process for CAV is double-blind, we kept the number of external reviewers to a minimum, to avoid accidental disclosures and conflicts of interest. Altogether, the reviewers drafted over 860 reviews and made an enormous effort to ensure a high-quality program. Following the tradition of CAV in recent years, the artifact evaluation was mandatory for tool submissions and optional but encouraged for regular submissions. We used an Artifact Evaluation Committee of 25 members. Our goal for artifact evaluation was to provide friendly "beta-testing" to tool developers; we recognize that developing a stable tool on a cutting-edge research topic is certainly not easy and we hope the constructive comments provided by the Artifact Evaluation Committee (AEC) were of help to the developers. As a result of the evaluation, the AEC accepted 25 of 31 artifacts accompanying regular papers; moreover, all 13 accepted tool papers passed the evaluation. We are grateful to the reviewers for their outstanding efforts in making sure each paper was fairly assessed. We would like to thank our artifact evaluation chair, Igor Konnov, and the AEC for evaluating all artifacts submitted with tool papers as well as optional artifacts submitted with regular papers.

Of course, without the tremendous effort put into the review process by our PC members this conference would not have been possible. We would like to thank the PC members for their effort and thorough reviews.

We would like to thank the FLoC chairs, Moshe Vardi, Daniel Kroening, and Marta Kwiatkowska, for the support provided, Thanh Hai Tran for maintaining the CAV website, and the always helpful Steering Committee members Orna Grumberg, Aarti Gupta, Daniel Kroening, and Kenneth McMillan. Finally, we would like to thank the team at the University of Oxford, who took care of the administration and organization of FLoC, thus making our jobs as CAV chairs much easier.

July 2018 Hana Chockler Georg Weissenbacher

### Organization

#### Program Committee

Christel Baier TU Dresden, Germany Clark Barrett Stanford University, USA Ezio Bartocci TU Wien, Austria Dirk Beyer LMU Munich, Germany Per Bjesse Synopsys Inc., USA Swarat Chaudhuri Rice University, USA Vijay D'Silva Google, USA Cezara Dragoi Inria Paris, ENS, France Kerstin Eder University of Bristol, UK Michael Emmi Nokia Bell Labs, USA Gerard Holzmann Nimble Research, USA Franjo Ivancic Google, USA

Alexander Ivrii IBM, Israel Himanshu Jain Synopsys, USA

Aws Albarghouthi University of Wisconsin-Madison, USA Jasmin Christian Blanchette Vrije Universiteit Amsterdam, Netherlands Roderick Bloem Graz University of Technology, Austria Ahmed Bouajjani IRIF, University Paris Diderot, France Pavol Cerny University of Colorado Boulder, USA Rohit Chadha University of Missouri, USA Wei-Ngan Chin National University of Singapore, Singapore Hana Chockler King's College London, UK Alessandro Cimatti Fondazione Bruno Kessler, Italy Loris D'Antoni University of Wisconsin-Madison, USA Cristina David University of Cambridge, UK Jyotirmoy Deshmukh University of Southern California, USA Isil Dillig The University of Texas at Austin, USA Georgios Fainekos Arizona State University, USA Dana Fisman University of Pennsylvania, USA Vijay Ganesh University of Waterloo, Canada Sicun Gao University of California San Diego, USA Alberto Griggio Fondazione Bruno Kessler, Italy Orna Grumberg Technion - Israel Institute of Technology, Israel Arie Gurfinkel University of Waterloo, Canada William Harrison Department of CS, University of Missouri, Columbia, USA Alan J. Hu The University of British Columbia, Canada Somesh Jha University of Wisconsin-Madison, USA

Stefan Kiefer University of Oxford, UK Laura Kovacs TU Wien, Austria Orna Kupferman Hebrew University, Israel Shuvendu Lahiri Microsoft, USA Rupak Majumdar MPI-SWS, Germany Ken McMillan Microsoft, USA Alexander Nadel Intel, Israel Mayur Naik Intel, USA Kedar Namjoshi Nokia Bell Labs, USA Shaz Qadeer Microsoft, USA Arjun Radhakrishna Microsoft, USA Roopsha Samanta Purdue University, USA Anna Slobodova Centaur Technology, USA Armando Solar-Lezama MIT, USA Ofer Strichman Technion, Israel Caterina Urban ETH Zurich, Switzerland Yakir Vizel Technion, Israel Bow-Yaw Wang Academia Sinica, Taiwan Georg Weissenbacher TU Wien, Austria Damien Zufferey MPI-SWS, Germany Florian Zuleger TU Wien, Austria

Susmit Jha SRI International, USA Ranjit Jhala University of California San Diego, USA Barbara Jobstmann EPFL and Cadence Design Systems, Switzerland Zachary Kincaid Princeton University, USA Viktor Kuncak Ecole Polytechnique Fédérale de Lausanne, Switzerland Dejan Nickovic Austrian Institute of Technology AIT, Austria Corina Pasareanu CMU/NASA Ames Research Center, USA Nir Piterman University of Leicester, UK Pavithra Prabhakar Kansas State University, USA Mitra Purandare IBM Research Laboratory Zurich, Switzerland Noam Rinetzky Tel Aviv University, Israel Philipp Ruemmer Uppsala University, Sweden Sriram Sankaranarayanan University of Colorado, Boulder, USA Martina Seidl Johannes Kepler University Linz, Austria Koushik Sen University of California, Berkeley, USA Sanjit A. Seshia University of California, Berkeley, USA Natasha Sharygina Università della Svizzera Italiana, Lugano, Switzerland Sharon Shoham Tel Aviv University, Israel Serdar Tasiran Amazon Web Services, USA Tomas Vojnar Brno University of Technology, Czechia Thomas Wahl Northeastern University, USA Thomas Wies New York University, USA Karen Yorav IBM Research Laboratory Haifa, Israel Lenore Zuck University of Illinois in Chicago, USA

#### Artifact Evaluation Committee


#### Additional Reviewers


Cohen, Ernie Costea, Andreea Dangl, Matthias Doko, Marko Drachsler Cohen, Dana Dreossi, Tommaso Dutra, Rafael Ebrahimi, Masoud Eisner, Cindy Fedyukovich, Grigory Fremont, Daniel Freund, Stephen

Friedberger, Karlheinz Ghorbani, Soudeh Ghosh, Shromona Goel, Shilpi Gong, Liang Govind, Hari Gu, Yijia Habermehl, Peter Hamza, Jad He, Paul Heo, Kihong Holik, Lukas

Humenberger, Andreas Hyvärinen, Antti Hölzl, Johannes Iusupov, Rinat Jacobs, Swen Jain, Mitesh Jaroschek, Maximilian Jha, Sumit Kumar Keidar-Barner, Sharon Khalimov, Ayrat Kiesl, Benjamin Koenighofer, Bettina Krstic, Srdjan Laeufer, Kevin Lee, Woosuk Lemberger, Thomas Lemieux, Caroline Lewis, Robert Liang, Jia Liang, Jimmy Liu, Peizun Lång, Magnus

Maffei, Matteo Marescotti, Matteo Mathur, Umang Miné, Antoine Mora, Federico Nevo, Ziv Ochoa, Martin Orni, Avigail Ouaknine, Joel Padhye, Rohan Padon, Oded Partush, Nimrod Pavlinovic, Zvonimir Pavlogiannis, Andreas Peled, Doron Pendharkar, Ishan Peng, Yan Petri, Gustavo Polozov, Oleksandr Popescu, Andrei Potomkin, Kostiantyn Raghothaman, Mukund Reynolds, Andrew Reynolds, Thomas Ritirc, Daniela Rogalewicz, Adam Scott, Joe Shacham, Ohad Song, Yahui Sosnovich, Adi Sousa, Marcelo Subramanian, Kausik Sumners, Rob Swords, Sol Ta, Quang Trung Tautschnig, Michael Traytel, Dmitriy Trivedi, Ashutosh Udupa, Abhishek van Dijk, Tom Wendler, Philipp Zdancewic, Steve Zulkoski, Ed

### Contents – Part I

#### Invited Papers


#### Program Analysis Using Polyhedra


George Argyros and Loris D'Antoni

#### Runtime Verification, Hybrid and Timed Systems


#### Probabilistic Systems


### Contents – Part II

#### Tools



and Serdar Tasiran


# Invited Papers

### **Semantic Adversarial Deep Learning**

Tommaso Dreossi<sup>1</sup>, Somesh Jha2(B), and Sanjit A. Seshia<sup>1</sup>

<sup>1</sup> University of California at Berkeley, Berkeley, USA *{*dreossi,sseshia*}*@berkeley.edu <sup>2</sup> University of Wisconsin, Madison, Madison, USA

jha@cs.wisc.edu

**Abstract.** Fueled by massive amounts of data, models produced by machine-learning (ML) algorithms, especially deep neural networks, are being used in diverse domains where trustworthiness is a concern, including automotive systems, finance, health care, natural language processing, and malware detection. Of particular concern is the use of ML algorithms in cyber-physical systems (CPS), such as self-driving cars and aviation, where an adversary can cause serious consequences.

However, existing approaches to generating adversarial examples and devising robust ML algorithms mostly ignore the *semantics* and *context* of the overall system containing the ML component. For example, in an autonomous vehicle using deep learning for perception, not every adversarial example for the neural network might lead to a harmful consequence. Moreover, one may want to prioritize the search for adversarial examples towards those that significantly modify the desired semantics of the overall system. Along the same lines, existing algorithms for constructing robust ML algorithms ignore the specification of the overall system. In this paper, we argue that the semantics and specification of the overall system has a crucial role to play in this line of research. We present preliminary research results that support this claim.

#### **1 Introduction**

*Machine learning (ML)* algorithms, fueled by massive amounts of data, are increasingly being utilized in several domains, including healthcare, finance, and transportation. Models produced by ML algorithms, especially *deep neural networks* (DNNs), are being deployed in domains where trustworthiness is a big concern, such as automotive systems [35], finance [25], health care [2], computer vision [28], speech recognition [17], natural language processing [38], and cybersecurity [8,42]. Of particular concern is the use of ML (including deep learning) in *cyber-physical systems* (CPS) [29], where the presence of an adversary can cause serious consequences. For example, much of the technology behind autonomous and driver-less vehicle development is "powered" by machine learning [4,14]. DNNs have also been used in airborne collision avoidance systems for unmanned aircraft (ACAS Xu) [22]. However, *in designing and deploying these algorithms in critical cyber-physical systems, the presence of an active adversary is often ignored.*

c The Author(s) 2018 H. Chockler and G. Weissenbacher (Eds.): CAV 2018, LNCS 10981, pp. 3–26, 2018. https://doi.org/10.1007/978-3-319-96145-3\_1

*Adversarial machine learning (AML)* is a field concerned with the analysis of ML algorithms to adversarial attacks, and the use of such analysis in making ML algorithms robust to attacks. It is part of the broader agenda for safe and verified ML-based systems [39,41]. In this paper, we first give a brief survey of the field of AML, with a particular focus on deep learning. We focus mainly on attacks on outputs or models that are produced by ML algorithms that occur *after training* or "external attacks", which are especially relevant to cyber-physical systems (e.g., for a driverless car the ML algorithm used for navigation has been already trained by the manufacturer once the "car is on the road"). These attacks are more realistic and are distinct from other type of attacks on ML models, such as attacks that poison the training data (see the paper [18] for a survey of such attacks). We survey attacks caused by *adversarial examples*, which are inputs crafted by adding small, often imperceptible, perturbations to force a trained ML model to misclassify.

We contend that the work on adversarial ML, while important and useful, is not enough. In particular, we advocate for the increased use of *semantics* in adversarial analysis and design of ML algorithms. *Semantic adversarial learning* explores a space of semantic modifications to the data, uses system-level semantic specifications in the analysis, utilizes semantic adversarial examples in training, and produces not just output labels but also additional semantic information. Focusing on deep learning, we explore these ideas and provide initial experimental data to support them.

**Roadmap.** Section 2 provides the relevant background. A brief survey of adversarial analysis is given in Sect. 3. Our proposal for semantic adversarial learning is given in Sect. 4.

#### **2 Background**

**Background on Machine Learning.** Next we describe some general concepts in machine learning (ML). We will consider the supervised learning setting. Consider a sample space Z of the form X <sup>×</sup> Y , and an ordered training set <sup>S</sup> = ((xi, yi))<sup>m</sup> i=1 (x<sup>i</sup> is the data and <sup>y</sup><sup>i</sup> is the corresponding label). Let <sup>H</sup> be a hypothesis space (e.g., weights corresponding to a logistic-regression model). There is a loss function - : H <sup>×</sup> Z -<sup>→</sup> <sup>R</sup> so that given a hypothesis w <sup>∈</sup> H and a sample (x, y) <sup>∈</sup> Z, we obtain a loss -(w,(x, y)). We consider the case where we want to minimize the loss over the training set S,

$$L\_S(w) = \frac{1}{m} \sum\_{i=1}^m \ell(w, (x\_i, y\_i)) + \lambda \mathcal{R}(w).$$

In the equation given above, λ > 0 and the term <sup>R</sup>(w) is called the *regularizer* and enforces "simplicity" in w. Since S is fixed, we sometimes denote i(w) = -(w,(xi, yi)) as a function only of <sup>w</sup>. We wish to find a <sup>w</sup> that minimizes <sup>L</sup>S(w) or we wish to solve the following optimization problem:

$$\min\_{w \in H} L\_S(w)$$

**Example:** We will consider the example of logistic regression. In this case X <sup>=</sup> <sup>R</sup>n, Y <sup>=</sup> {+1, <sup>−</sup>1}, H <sup>=</sup> <sup>R</sup>n, and the loss function -(w,(x, y)) is as follows (· represents the dot product of two vectors):

$$\log\left(1+e^{-y\left(w^T\cdot x\right)}\right)$$

If we use the <sup>L</sup><sup>2</sup> regularizer (i.e. <sup>R</sup>(w) = w2), then <sup>L</sup>S(w) becomes:

$$\frac{1}{m} \sum\_{i=1}^{m} \log\left(1 + e^{-y\_i(w^T \cdot x\_i)}\right) + \lambda \||w||\_2$$

**Stochastic Gradient Descent.** *Stochastic Gradient Descent (SGD)* is a popular method for solving optimization tasks (such as the optimization problem minw∈H <sup>L</sup>S(w) we considered before). In a nutshell, SGD performs a series of updates where each update is a gradient descent update with respect to a small set of points sampled from the training set. Specifically, suppose that we perform SGD T times. There are two typical forms of SGD: in the first form, which we call Sample-SGD, we uniformly and randomly sample <sup>i</sup>t <sup>∼</sup> [m] at time <sup>t</sup>, and perform a gradient descent based on the <sup>i</sup>t-th sample (xi*<sup>t</sup>* , yi*<sup>t</sup>* ):

$$w\_{t+1} = G\_{\ell\_t, \eta\_t}(w\_t) = w\_t - \eta\_t \ell\_{i\_t}'(w\_t) \tag{1}$$

where <sup>w</sup>t is the hypothesis at time <sup>t</sup>, <sup>η</sup>t is a parameter called the *learning rate*, and - <sup>i</sup>*<sup>t</sup>* (wt) denotes the derivative of <sup>i</sup>*<sup>t</sup>* (w) evaluated at <sup>w</sup>t. We will denote <sup>G</sup>*<sup>t</sup>*,η*<sup>t</sup>* as <sup>G</sup>t. In the second form, which we call Perm-SGD, we first perform a random permutation of S, and then apply Eq. <sup>1</sup> T times by cycling through S according to the order of the permutation. The process of SGD can be summarized as a diagram:

$$w\_0 \stackrel{G\_1}{\longrightarrow} w\_1 \stackrel{G\_2}{\longrightarrow} \cdots \stackrel{G\_t}{\longrightarrow} w\_t \stackrel{G\_{t+1}}{\longrightarrow} \cdots \stackrel{G\_T}{\longrightarrow} w\_T$$

**Classifiers.** The output of the learning algorithm gives us a *classifier*, which is a function from <sup>n</sup> to C, where denotes the set of reals and C is the set of class labels. To emphasize that a classifier depends on a hypothesis w <sup>∈</sup> H, which is the output of the learning algorithm described earlier, we will write it as <sup>F</sup>w (if w is clear from the context, we will sometimes simply write F). For example, after training in the case of logistic regression we obtain a function from <sup>n</sup> to {−1, +1}. Vectors will be denoted in boldface, and the r-th component of a vector **<sup>x</sup>** is denoted by **<sup>x</sup>**[r].

Throughout the paper, we refer to the function <sup>s</sup>(Fw) as the *softmax layer* corresponding to the classifier <sup>F</sup>w. In the case of logistic regression, <sup>s</sup>(Fw)(**x**) is the following tuple (the first element is the probability of −1 and the second one is the probability of +1):

$$\langle \frac{1}{1 + e^{w^T \cdot \mathbf{x}}}, \frac{1}{1 + e^{-w^T \cdot \mathbf{x}}} \rangle\_t$$

Formally, let <sup>c</sup> <sup>=</sup> |C| and <sup>F</sup>w be a classifier, we let <sup>s</sup>(Fw) be the function that maps R<sup>n</sup> to R<sup>c</sup> <sup>+</sup> such that s(Fw)(**x**)<sup>1</sup> = 1 for any **<sup>x</sup>** (i.e., <sup>s</sup>(Fw) computes a probability vector). We denote <sup>s</sup>(Fw)(**x**)[l] to be the probability of <sup>s</sup>(Fw)(**x**) at label <sup>l</sup>. Recall that the softmax function from <sup>R</sup><sup>k</sup> to a probability distribution over {1, ··· , k} = [k] such that the probability of <sup>j</sup> <sup>∈</sup> [k] for a vector **<sup>x</sup>** <sup>∈</sup> <sup>R</sup><sup>k</sup> is

$$\frac{e^{\mathbf{x}[j]}}{\sum\_{r=1}^{k} e^{\mathbf{x}[r]}}$$

Some classifiers <sup>F</sup>w(**x**) are of the form arg maxl <sup>s</sup>(Fw)(**x**)[l] (i.e., the classifier <sup>F</sup>w outputs the label with the maximum probability according to the "softmax layer"). For example, in several deep-neural network (DNN) architectures the last layer is the *softmax* layer. We are assuming that the reader is a familiar with basics of deep-neural networks (DNNs). For readers not familiar with DNNs we can refer to the excellent book by Goodfellow et al. [15].

**Background on Logic.** Temporal logics are commonly used for specifying desired and undesired properties of systems. For cyber-physical systems, it is common to use temporal logics that can specify properties of real-valued signals over real time, such as *signal temporal logic* (STL) [30] or *metric temporal logic* (MTL) [27].

<sup>A</sup> *signal* is a function <sup>s</sup> : <sup>D</sup> <sup>→</sup> <sup>S</sup>, with <sup>D</sup> <sup>⊆</sup> <sup>R</sup>≥<sup>0</sup> an interval and either <sup>S</sup> <sup>⊆</sup> <sup>B</sup> or S <sup>⊆</sup> <sup>R</sup>, where <sup>B</sup> <sup>=</sup> {, ⊥} and <sup>R</sup> is the set of reals. Signals defined on <sup>B</sup> are called *booleans*, while those on <sup>R</sup> are said *real-valued*. A *trace* <sup>w</sup> <sup>=</sup> {s1,...,sn} is a finite set of real-valued signals defined over the same interval D. We use variables <sup>x</sup>i to denote the value of a real-valued signal at a particular time instant.

Let <sup>Σ</sup> <sup>=</sup> {σ1,...,σk} be a finite set of predicates <sup>σ</sup>i : <sup>R</sup><sup>n</sup> <sup>→</sup> <sup>B</sup>, with <sup>σ</sup>i <sup>≡</sup> <sup>p</sup>i(x1,...,xn) - 0, - ∈ {<, ≤}, and <sup>p</sup>i : <sup>R</sup><sup>n</sup> <sup>→</sup> <sup>R</sup> a function in the variables <sup>x</sup><sup>1</sup>,...,xn. An STL formula is defined by the following grammar:

$$\varphi := \sigma \mid \neg \varphi \mid \varphi \land \varphi \mid \varphi \text{ \mathfrak{u}\_I } \varphi \tag{2}$$

where <sup>σ</sup> <sup>∈</sup> <sup>Σ</sup> is a predicate and <sup>I</sup> <sup>⊂</sup> <sup>R</sup>≥<sup>0</sup> is a closed non-singular interval. Other common temporal operators can be defined as syntactic abbreviations in the usual way, like for instance <sup>ϕ</sup><sup>1</sup> <sup>∨</sup> <sup>ϕ</sup><sup>2</sup> := <sup>¬</sup>(¬ϕ<sup>1</sup> <sup>∧</sup> <sup>ϕ</sup><sup>2</sup>), <sup>F</sup>I <sup>ϕ</sup> := <sup>U</sup>I <sup>ϕ</sup>, or <sup>G</sup>I <sup>ϕ</sup> := <sup>¬</sup>FI <sup>¬</sup>ϕ. Given a <sup>t</sup> <sup>∈</sup> <sup>R</sup>≥<sup>0</sup>, a shifted interval <sup>I</sup> is defined as <sup>t</sup>+<sup>I</sup> <sup>=</sup> {t+<sup>t</sup> | t <sup>∈</sup> I}. The qualitative (or Boolean) semantics of STL is given in the usual way:

**Definition 1 (Qualitative semantics).** *Let* <sup>w</sup> *be a trace,* <sup>t</sup> <sup>∈</sup> <sup>R</sup>≥<sup>0</sup>*, and* ϕ *be an STL formula. The* qualitative semantics *of* ϕ *is inductively defined as follows:*

$$\begin{aligned} w, t &\mid = \sigma \text{ iff } \sigma(w(t)) \text{ is } true\\ w, t &\mid = \neg \varphi \text{ iff } w, t \not\equiv \varphi\\ w, t &\mid = \varphi\_1 \land \varphi\_2 \text{ iff } w, t \mid = \varphi\_1 \text{ and } w, t \mid = \varphi\_2\\ w, t &\mid = \varphi\_1 \mathsf{U}\_I \varphi\_2 \text{ iff } \exists t' \in t + I \text{ s.t. } w, t' \mid = \varphi\_2 \text{ and } \forall t'' \in [t, t'], w, t'' \mid = \varphi\_1 \end{aligned} \tag{3}$$

A trace w satisfies a formula ϕ if and only if w, <sup>0</sup> <sup>|</sup><sup>=</sup> ϕ, in short w <sup>|</sup><sup>=</sup> ϕ. STL also admits a quantitative or robust semantics, which we omit for brevity. This provides quantitative information on the formula, telling how strongly the specification is satisfied or violated for a given trace.

#### **3 Attacks**

There are several types of attacks on ML algorithms. For excellent material on various attacks on ML algorithms we refer the reader to [3,18]. For example, in *training time* attacks an adversary wishes to poison a data set so that a "bad" hypothesis is learned by an ML-algorithm. This attack can be modeled as a game between the algorithm ML and an adversary A as follows:


$$\text{any minimum}$$

$$\min\_{w \in H} L\_{S \cup \widehat{S}}(w).$$

The attacker wants to maximize the above quantity and thus chooses S such that minw∈<sup>H</sup> <sup>L</sup>S∪S-(w) is maximized. For a recent paper on certified defenses for such attacks we refer the reader to [44]. In *model extraction* attacks an adversary with black-box access to a classifier, but no prior knowledge of the parameters of a ML algorithm or training data, aims to duplicate the functionality of (i.e., steal) the classifier by querying it on well chosen data points. For an example, model-extraction attacks see [45].

In this paper, we consider *test-time attacks*. We assume that the classifier <sup>F</sup>w has been trained without any interference from the attacker (i.e. no training time attacks). Roughly speaking, an attacker has an image **x** (e.g. an image of stop sign) and wants to craft a perturbation δ so that the label of **<sup>x</sup>** <sup>+</sup> δ is what the attacker desires (e.g. yield sign). The next sub-section describes test-time attacks in detail. We will sometimes refer to <sup>F</sup>w as simply <sup>F</sup>, but the hypothesis w is lurking in the background (i.e., whenever we refer to w, it corresponds to the classifier F).

#### **3.1 Test-Time Attacks**

The adversarial goal is to take any input vector **x** ∈ <sup>n</sup> and produce a minimally altered version of **x**, *adversarial sample* denoted by **x**, that has the property of being misclassified by a classifier <sup>F</sup> : <sup>n</sup> → C. Formally speaking, an adversary wishes to solve the following optimization problem:

$$\begin{array}{ll} \min\_{\delta \in \mathbb{R}^n} & \mu(\delta) \\ \text{such that } F(\mathbf{x} + \delta) \in T \\ & \delta \cdot \mathbf{M} = 0 \end{array}$$

The various terms in the formulation are μ is a metric on n, T ⊆ C is a subset of the labels (the reader should think of T as the target labels for the attacker), and **<sup>M</sup>** (called the *mask*) is a n-dimensional 0–1 vector of size n. The objective function minimizes the metric μ on the perturbation δ. Next we describe various constraints in the formulation.

– F(**<sup>x</sup>** <sup>+</sup> δ) <sup>∈</sup> T

The set <sup>T</sup> constrains the perturbed vector **<sup>x</sup>**+δ<sup>1</sup> to have the label (according to F) in the set T. For *mis-classification* problems the label of **<sup>x</sup>** and **<sup>x</sup>** <sup>+</sup> δ are different, so we have T <sup>=</sup> C−{F(**x**)}. For *targeted mis-classification* we have T <sup>=</sup> {t} (for t ∈ C), where t is the target that an attacker wants (e.g., the attacker wants t to correspond to a yield sign).

– δ · **<sup>M</sup>** = 0

The vector M can be considered as a mask (i.e., an attacker can only perturb a dimension i if M[i] = 0), i.e., if M[i] = 1 then δ[i] is forced to be 0. Essentially the attacker can only perturb dimension i if the i-th component of M is 0, which means that δ lies in k-dimensional space where k is the number of non-zero entries in Δ. This constraint is important if an attacker wants to target a certain area of the image (e.g., glasses of in a picture of person) to perturb.

– *Convexity*

Notice that even if the metric <sup>μ</sup> is convex (e.g., <sup>μ</sup> is the <sup>L</sup><sup>2</sup> norm), because of the constraint involving F, the optimization problem is *not convex* (the constraint δ ·**<sup>M</sup>** = 0 is convex). In general, solving convex optimization problems is more tractable non-convex optimization [34].

Note that the constraint δ ·**<sup>M</sup>** = 0 essentially constrains the vector to be in a lower-dimensional space and does add additional complexity to the optimization problem. Therefore, for the rest of the section we will ignore that constraint and work with the following formulation:

$$\begin{array}{ll} \min\_{\delta \in \mathbb{R}^n} & \mu(\delta) \\ \text{such that } F(\mathbf{x} + \delta) \in T \\ \end{array}$$

**FGSM Mis-classification Attack -** This algorithm is also known as the *fast gradient sign method (FGSM)* [16]. The adversary crafts an adversarial sample **<sup>x</sup>** <sup>=</sup> **<sup>x</sup>** <sup>+</sup> δ for a given legitimate sample **<sup>x</sup>** by computing the following perturbation:

$$\delta = \varepsilon \operatorname{sign}(\nabla\_{\mathbf{x}} L\_F(\mathbf{x})) \tag{4}$$

The function <sup>L</sup>F (**x**) is a shorthand for -(w, **<sup>x</sup>**, l(**x**)), where w is the hypothesis corresponding to the classifier F, **<sup>x</sup>** is the data point and l(**x**) is the label of **x** (essentially we evaluate the loss function at the hypothesis corresponding to the classifier). The gradient of the function <sup>L</sup>F is computed with respect to

<sup>1</sup> The vectors are added component wise.

**<sup>x</sup>** using sample **<sup>x</sup>** and label <sup>y</sup> <sup>=</sup> <sup>l</sup>(**x**) as inputs. Note that <sup>∇</sup>**x**LF (**x**) is an <sup>n</sup>dimensional vector and sign(∇**x**LF (**x**)) is a <sup>n</sup>-dimensional vector whose <sup>i</sup>-th element is the sign of the <sup>∇</sup>**x**LF (**x**))[i]. The value of the *input variation parameter* ε factoring the sign matrix controls the perturbation's amplitude. Increasing its value increases the likelihood of **<sup>x</sup>** being misclassified by the classifier F but on the contrary makes adversarial samples easier to detect by humans. The key idea is that FGSM takes a step *in the direction of the gradient of the loss function* and thus tries to maximize it. Recall that SGD takes a step in the direction that is opposite to the gradient of the loss function because it is trying to minimize the loss function.

**JSMA Targeted Mis-classification Attack -** This algorithm is suitable for targeted misclassification [37]. We refer to this attack as JSMA throughout the rest of the paper. To craft the perturbation δ, components are sorted by decreasing *adversarial saliency value*. The adversarial saliency value S(**x**, t)[i] of component i for an adversarial target class t is defined as:

$$S(\mathbf{x},t)[i] = \begin{cases} 0 \text{ if } \frac{\partial s(F)[t](\mathbf{x})}{\partial \mathbf{x}[i]} < 0 \text{ or } \sum\_{j \neq t} \frac{\partial s(F)[j](\mathbf{x})}{\partial \mathbf{x}[i]} > 0\\ \frac{\partial s(F)[t](\mathbf{x})}{\partial \mathbf{x}[i]} \left| \sum\_{j \neq t} \frac{\partial s(F)[j](\mathbf{x})}{\partial \mathbf{x}[i]} \right| \text{ otherwise} \end{cases} \tag{5}$$

where matrix <sup>J</sup>F <sup>=</sup> ∂s(F )[j](**x**) ∂**x**[i] ij is the Jacobian matrix for the output of the softmax layer s(F)(**x**). Since k∈C <sup>s</sup>(F)[k](**x**) = 1, we have the following equation:

$$\frac{\partial s(F)[t](\mathbf{x})}{\partial \mathbf{x}[i]} = -\sum\_{j \neq t} \frac{\partial s(F)[j](\mathbf{x})}{\partial \mathbf{x}[i]}$$

The first case corresponds to the scenario if changing the i-th component of **<sup>x</sup>** takes us further away from the target label t. Intuitively, S(**x**, t)[i] indicates how likely is changing the i-th component of **<sup>x</sup>** going to "move towards" the target label t. Input components i are added to perturbation δ in order of decreasing adversarial saliency value <sup>S</sup>(**x**, t)[i] until the resulting adversarial sample **<sup>x</sup>** <sup>=</sup> **<sup>x</sup>** <sup>+</sup> δ achieves the target label t. The perturbation introduced for each selected input component can vary. Greater individual variations tend to reduce the number of components perturbed to achieve misclassification.

**CW Targeted Mis-classification Attack.** The CW-attack [5] is widely believed to be one of the most "powerful" attacks. The reason is that CW cast their problem as an unconstrained optimization problem, and then use state-ofthe art solver (i.e. Adam [24]). In other words, they leverage the advances in optimization for the purposes of generating adversarial examples.

In their paper Carlini-Wagner consider a wide variety of formulations, but we present the one that performs best according to their evaluation. The optimization problem corresponding to CW is as follows:

$$\begin{array}{ll} \min\_{\delta \in \mathbb{R}^n} & \mu(\delta) \\ \text{such that } F(\mathbf{x} + \delta) = t \\ \end{array}$$

CW use an existing solver (Adam [24]) and thus need to make sure that each component of **<sup>x</sup>** <sup>+</sup> δ is between 0 and 1 (i.e. valid pixel values). Note that the other methods did not face this issue because they control the "internals" of the algorithm (i.e., CW used a solver in a "black box" manner). We introduce a new vector **<sup>w</sup>** whose i-th component is defined according to the following equation:

$$\delta[i] = \frac{1}{2}(\tanh(\mathbf{w}[i]) + 1) - \mathbf{x}[i]$$

Since <sup>−</sup><sup>1</sup> <sup>≤</sup> tanh(**w**[i]) <sup>≤</sup> 1, it follows that 0 <sup>≤</sup> **<sup>x</sup>**[i] + δ[i] <sup>≤</sup> 1. In terms of this new variable the optimization problem becomes:

$$\begin{array}{l} \min\_{\mathbf{w} \in \mathbb{R}^n} \ \mu(\frac{1}{2}(\tanh(\mathbf{w}) + 1) - \mathbf{x}) \\ \text{such that } F(\frac{1}{2}(\tanh(\mathbf{w}) + 1)) = t \end{array}$$

Next they approximate the constraint (F(**x**) = **<sup>t</sup>**) with the following function:

$$g(\mathbf{x}) = \max\left(\max\_{i \neq t} Z(F)(\mathbf{x})[i] - Z(F)(\mathbf{x})[t], -\kappa\right).$$

In the equation given above Z(F) is the input of the DNN to the softmax layer (i.e. s(F)(**x**) = softmax(Z(F)(**x**))) and κ is a confidence parameter (higher κ encourages the solver to find adversarial examples with higher confidence). The new optimization formulation is as follows:

$$\begin{array}{ll} \min\_{\mathbf{w} \in \mathbb{R}^n} & \mu(\frac{1}{2}(\tanh(\mathbf{w}) + 1) - \mathbf{x}) \\ \text{such that } g(\frac{1}{2}(\tanh(\mathbf{w}) + 1)) & \le 0 \end{array}$$

Next we incorporate the constraint into the objective function as follows:

$$\min\_{\mathbf{w}\in\mathbb{R}^n} \mu(\frac{1}{2}(\tanh(\mathbf{w}) + 1) - \mathbf{x}) + c \, g(\frac{1}{2}(\tanh(\mathbf{w}) + 1))$$

In the objective given above, the "Lagrangian variable" c > 0 is a suitably chosen constant (from the optimization literature we know that there exists c > 0 such that the optimal solutions of the last two formulations are the same).

#### **3.2 Adversarial Training**

Once an attacker finds an adversarial example, then the algorithm can be retrained using this example. Researchers have found that retraining the model with adversarial examples produces a more robust model. For this section, we will work with attack algorithms that have a target label t (i.e. we are in the targeted mis-classification case, such as JSMA or CW). Let <sup>A</sup>(w, **<sup>x</sup>**, t) be the attack algorithm, where its inputs are as follows: w <sup>∈</sup> H is the current hypothesis, **<sup>x</sup>** is the data point, and t ∈ C is the target label. The output of <sup>A</sup>(w, **<sup>x</sup>**, t) is a perturbation δ such that F(**x**+δ) = t. If the attack algorithm is simply a misclassification algorithm (e.g. FGSM or Deepfool) we will drop the last parameter t.

An *adversarial training* algorithm <sup>R</sup>A(w, **<sup>x</sup>**, t) is parameterized by an attack algorithm <sup>A</sup> and outputs a new hypothesis <sup>w</sup> <sup>∈</sup> H. Adversarial training works by taking a datapoint **<sup>x</sup>** and an attack algorithm <sup>A</sup>(w, **<sup>x</sup>**, t) as its input and then retraining the model using a specially designed loss function (essentially one performs a single step of the SGD using the new loss function). The question arises: what loss function to use during the training? Different methods use different loss functions.

Next, we discuss some adversarial training algorithms proposed in the literature. At a high level, an important point is that the more sophisticated an adversarial perturbation algorithm is, harder it is to turn it into adversarial training. The reason is that it is hard to "encode" the adversarial perturbation algorithm as an objective function and optimize it. We will see this below, especially for the virtual adversarial training (VAT) proposed by Miyato et al. [32].

**Retraining for FGSM.** We discussed the FGSM attack method earlier. In this case A = FGSM. The loss function used by the retraining algorithm <sup>R</sup>FGSM(w, **<sup>x</sup>**, t) is as follows:

$$\ell\_{\text{FGSM}}(w, \mathbf{x}\_i, y\_i) = \ell(w, \mathbf{x}\_i, y\_i) + \lambda \ell \left(w, \mathbf{x}\_i + \text{FGSM}(w, \mathbf{x}\_i), y\_i\right)$$

Recall that FGSM(w, **<sup>x</sup>**) was defined earlier, and λ is a regularization parameter. The simplicity of FGSM(w, **<sup>x</sup>**i) allows taking its gradient, but this objective function requires label <sup>y</sup>i because we are reusing the same loss function used to train the original model. Further, FGSM(w, **<sup>x</sup>**i) may not be very good because it may not produce good adversarial perturbation direction (i.e. taking a bigger step in this direction might produce a distorted image). The retraining algorithm is simply as follows: *take one step in the SGD using the loss function* -FGSM *at the data point* **<sup>x</sup>**i.

A caveat is needed for taking gradient during the SGD step. At iteration t suppose we have model parameters <sup>w</sup>t, and we need to compute the gradient of the objective. Note that FGSM(w, **<sup>x</sup>**) depends on w so by chain rule we need to compute <sup>∂</sup>FGSM(w, **<sup>x</sup>**)/∂w|w=w*<sup>t</sup>* . However, this gradient is volatile<sup>2</sup>, and so instead Goodfellow et al. only compute:

$$\left. \frac{\partial \ell \left( w, \mathbf{x}\_i + \text{FGSM}(w\_t, \mathbf{x}\_i), y\_i \right)}{\partial w} \right|\_{w = w\_t}$$

Essentially they treat FGSM(wt, **<sup>x</sup>**i) as a constant while taking the derivative.

**Virtual Adversarial Training (VAT).** Miyato et al. [32] observed the drawback of requiring label <sup>y</sup>i for the adversarial example. Their intuition is that one wants the classifier to behave "similarly" on **<sup>x</sup>** and **<sup>x</sup>**+δ, where δ is the adversarial perturbation. Specifically, the distance of the distribution corresponding to the output of the softmax layer <sup>F</sup>w on **<sup>x</sup>** and **<sup>x</sup>**+<sup>δ</sup> is small. VAT uses *KullbackLeibler*

<sup>2</sup> In general, second-order derivatives of a classifier corresponding to a DNN vanish at several points because several layers are piece-wise linear.

*(*KL*) divergence* as the measure of the distance between two distributions. Recall that KL divergence of two distributions P and Q over the same finite domain D is given by the following equation:

$$\text{KL}(P, Q) = \sum\_{i \in D} P(i) \log \left( \frac{P(i)}{Q(i)} \right)$$

Therefore, they propose that, instead of reusing -, they propose to use the following for the regularizer,

$$\Delta(r, \mathbf{x}, w) = \text{KL}\left(s(F\_w)(\mathbf{x})[y], s(F\_w)(\mathbf{x} + r)[y]\right)$$

for some <sup>r</sup> such that r ≤ <sup>δ</sup>. As a result, the label <sup>y</sup>i is *no longer* required. The question is: what r to use? Miyato et al. [32] propose that in theory we should use the "best" one as

$$\max\_{r:\|r\| \le \delta} \text{KL}\left(s(F\_w)(\mathbf{x})[y], s(F\_w)(\mathbf{x} + r)[y]\right)$$

This thus gives rise to the following loss function to use during retraining:

$$\ell\_{\text{VAT}}(w, \mathbf{x}\_i, y\_i) = \ell(w, \mathbf{x}\_i, y\_i) + \lambda \max\_{r: ||r|| \le \delta} \Delta(r, \mathbf{x}\_i, w)$$

However, one cannot easily compute the gradient for the regularizer. Hence the authors perform an approximation as follows:


$$\ell\_{\text{VAT}}(\theta, \mathbf{x}\_i, y\_i) = \ell(\theta, \mathbf{x}\_i, y\_i) + \lambda \Delta(r^\*, \mathbf{x}\_i, w)$$

3. Now suppose in the process of SGD we are at iteration t with model parameters <sup>w</sup>t, and we need to compute ∂-VAT/∂w|w=w*<sup>t</sup>* . By chain rule we need to compute ∂r<sup>∗</sup>/∂w|w=w*<sup>t</sup>* . However the authors find that such gradients are volatile, so they instead fix <sup>r</sup><sup>∗</sup> as a constant at the point <sup>θ</sup>t, and compute

$$\left. \frac{\partial \text{KL}\left(s(F\_w)(\mathbf{x})[y], s(F\_w)(\mathbf{x} + r)[y]\right)}{\partial w} \right|\_{w = w\_t}$$

#### **3.3 Black Box Attacks**

Recall that earlier attacks (e.g. FGSM and JSMA) needed white-box access to the classifier F (essentially because these attacks require first order information about the classifier). In this section, we present black-box attacks. In this case, an attacker can *only* ask for the labels F(**x**) for certain data points. Our presentation is based on [36], but is more general.

Let <sup>A</sup>(w, **<sup>x</sup>**, t) be the attack algorithm, where its inputs are: w <sup>∈</sup> H is the current hypothesis, **<sup>x</sup>** is the data point, and t ∈ C is the target label. The output of <sup>A</sup>(w, **<sup>x</sup>**, t) is a perturbation δ such that F(**<sup>x</sup>** <sup>+</sup> δ) = t. If the attack algorithm is simply a mis-classification algorithm (e.g. FGSM or Deepfool) we will drop the last parameter t (recall that in this case the attack algorithm returns a δ such that <sup>F</sup>(**<sup>x</sup>** <sup>+</sup> <sup>δ</sup>) <sup>=</sup> <sup>F</sup>(**x**)). An *adversarial training* algorithm <sup>R</sup>A(w, **<sup>x</sup>**, t) is parameterized by an attack algorithm <sup>A</sup> and outputs a new hypothesis <sup>w</sup> <sup>∈</sup> H (this was discussed in the previous subsection).

*Initialization:* We pick a substitute classifier <sup>G</sup> and an initial seed data set <sup>S</sup><sup>0</sup> and train G. For simplicity, we will assume that the sample space Z <sup>=</sup> X <sup>×</sup> Y and the hypothesis space H for G is same as that of F (the classifier under attack). However, this is not crucial to the algorithm. We will call G the *substitute classifier* and <sup>F</sup> the *target classifier*. Let <sup>S</sup> <sup>=</sup> <sup>S</sup><sup>0</sup> be the initial data set, which will be updated as we iterate.

*Iteration:* Run the attack algorithm <sup>A</sup>(w, **<sup>x</sup>**, t) on G and obtain a δ. If F(**x**+δ) = t, then **stop** we are done. If F(**<sup>x</sup>** <sup>+</sup> δ) = t but not equal to t, we augment the data set S as follows:

$$S = S \cup (\mathbf{x} + \delta, t')$$

We now retrain G on this new data set, which essentially means running the SGD on the new data point (**<sup>x</sup>** <sup>+</sup> δ, t ). Notice that we can also use adversarial training <sup>R</sup>A(w, **<sup>x</sup>**, t) to update G (to our knowledge this has been not tried out in the literature).

#### **3.4 Defenses**

Defenses with formal guarantees against test-time attacks have proven elusive. For example, Carlini and Wagner [6] have a recent paper that breaks *ten recent defense proposals*. However, defenses that are based on robust-optimization objectives have demonstrated promise [26,33,43]. Several techniques for verifying properties of a DNN (in isolation) have appeared recently (e.g., [12,13,19,23]). Due to space limitations we will not give a detailed account of all these defenses.

#### **4 Semantic Adversarial Analysis and Training**

A central tenet of this paper is that the analysis of deep neural networks (and machine learning components, in general) must be more *semantic*. In particular, we advocate for the increased use of semantics in several aspects of adversarial analysis and training, including the following:


#### **4.1 Compositional Falsification**

We discuss the problem of performing system-level analysis of a deep learning component, using recent work by the authors [9,10] to illustrate the main points. The material in this section is mainly based on [40].

We begin with some basic notation. Let S denote the model of the full system S under verification, E denote a model of its environment, and Φ denote the specification to be verified. C is an ML model (e.g. DNN) that is part of S. As in Sect. 3, let **<sup>x</sup>** be an input to C. We assume that Φ is a trace property – a set of behaviors of the closed system obtained by composing S with E, denoted SE. The goal of falsification is to find one or more counterexamples showing how the composite system SE violates Φ. In this context, *semantic analysis of* C *is about finding a modification* δ *from a space of semantic modifications* Δ *such that* C, *on* **<sup>x</sup>** <sup>+</sup> δ, *produces a misclassification that causes* SE *to violate* Φ.

**Fig. 1.** Automatic Emergency Braking System (AEBS) in closed loop. An image classifier based on deep neural networks is used to perceive objects in the ego vehicle's frame of view.

**Example Problem.** As an illustrative example, consider a simple model of an Automatic Emergency Braking System (AEBS), that attempts to detect objects in front of a vehicle and actuate the brakes when needed to avert a collision. Figure 1 shows the AEBS as a system composed of a controller (automatic braking), a plant (vehicle sub-system under control, including transmission), and an advanced sensor (camera along with an obstacle detector based on deep learning). The AEBS, when combined with the vehicle's environment, forms a closed loop control system. The controller regulates the acceleration and braking of the plant using the velocity of the subject (ego) vehicle and the distance between it and an obstacle. The sensor used to detect the obstacle includes a camera along with an image classifier based on DNNs. In general, this sensor can provide noisy measurements due to incorrect image classifications which in turn can affect the correctness of the overall system.

Suppose we want to verify whether the distance between the ego vehicle and a preceding obstacle is always larger than 2 m. In STL, this requirement Φ can be written as <sup>G</sup><sup>0</sup>,T (**x**ego <sup>−</sup>**x**obs<sup>2</sup> <sup>≥</sup> 2). Such verification requires the exploration of a very large input space comprising of the control inputs (e.g., acceleration and braking pedal angles) and the machine learning (ML) component's feature space (e.g., all the possible pictures observable by the camera). The latter space is particularly large—for example, note that the feature space of RGB images of dimension 1000×600 px (for an image classifier) contains 256<sup>1000</sup>×600×<sup>3</sup> elements.

In the above example, SE is the closed loop system in Fig. <sup>1</sup> where S comprises the DNN and the controller, and E comprises everything else. C is the DNN used for object detection and classification.

This case study has been implemented in Matlab/Simulink<sup>3</sup> in two versions that use two different Convolutional Neural Networks (CNNs): the Caffe [20] version of AlexNet [28] and the Inception-v3 model created with Tensorflow [31], both trained on the ImageNet database [1]. Further details about this example can be obtained from [9].

**Approach.** A key idea in our approach is to have a *system-level verifier* that abstracts away the component C while verifying Φ on the resulting abstraction. This system-level verifier communicates with a component-level analyzer that searches for semantic modifications δ to the input **<sup>x</sup>** of C that could lead to violations of the system-level specification Φ. Figure <sup>2</sup> illustrates this approach.

Correct / Incorrect (+ counterexamples)

**Fig. 2.** Compositional verification approach. A system-level verifier cooperates with a component-level analysis procedure (e.g., adversarial analysis of a machine learning component to find misclassifications).

We formalize this approach while trying to emphasize the intuition. Let T denote the set of all possible traces of the composition of the system with its environment, <sup>S</sup>E. Given a specification <sup>Φ</sup>, let <sup>T</sup>Φ denote the set of traces in <sup>T</sup> satisfying <sup>Φ</sup>. Let <sup>U</sup>Φ denote the projection of these traces onto the state and interface variables of the environment <sup>E</sup>. <sup>U</sup>Φ is termed as the *validity domain* of Φ, i.e., the set of environment behaviors for which Φ is satisfied. Similarly, the complement set <sup>U</sup><sup>¬</sup>Φ is the set of environment behaviors for which <sup>Φ</sup> is violated.

Our approach works as follows:

1. The System-level Verifier initially performs two analyses with two extreme abstractions of the ML component. First, it performs an *optimistic* analysis, wherein the ML component is assumed to be a "perfect classifier", i.e., all feature vectors are correctly classified. In situations where ML is used for perception/sensing, this abstraction assumes perfect perception/sensing. Using this abstraction, we compute the validity domain for this abstract model of the system, denoted U <sup>+</sup> Φ . Next, it performs a *pessimistic* analysis where the ML component is abstracted by a "completely-wrong classifier", i.e., all feature vectors are misclassified. Denote the resulting validity domain as U <sup>−</sup> Φ . It is expected that U <sup>+</sup> Φ <sup>⊇</sup> <sup>U</sup> <sup>−</sup> Φ .

<sup>3</sup> https://github.com/dreossi/analyzeNN.

Abstraction permits the System-level Verifier to operate on a lowerdimensional search space and identify a region in this space that may be affected by the malfunctioning of component C—a so-called "region of uncertainty" (ROU). This region, U <sup>C</sup> ROU is computed as <sup>U</sup> <sup>+</sup> Φ \ <sup>U</sup> <sup>−</sup> Φ . In other words, it comprises all environment behaviors that could lead to a system-level failure when component C malfunctions. This region U <sup>C</sup> ROU , projected onto the inputs of C, is communicated to the ML Analyzer. (Concretely, in the context of our example of Sect. 4.1, this corresponds to finding a subspace of images that corresponds to U <sup>C</sup> ROU .)


The communication between the System-level Verifier and the Component-level (ML) Analyzer continues thus, until we either prove/disprove Φ, or we run out of resources.

**Sample Results.** We have applied the above approach to the problem of *compositional falsification* of cyber-physical systems (CPS) with machine learning components [9]. For this class of CPS, including those with highly non-linear dynamics and even black-box components, simulation-based falsification of temporal logic properties is an approach that has proven effective in industrial practice (e.g., [21,46]). We present here a sample of results on the AEBS example from [9], referring the reader to more detailed descriptions in the other papers on the topic [9,10].

In Fig. 4 we show one result of our analysis for the Inception-v3 deep neural network. This figure shows both correctly classified and misclassified images on a range of synthesized images where (i) the environment vehicle is moved away from or towards the ego vehicle (along z-axis), (ii) it is moved sideways along

**Fig. 3.** Machine Learning Analyzer: Searching the Semantic Modification Space. A concrete semantic modification space (top left) is mapped into a discrete abstract space. Systematic sampling, using low-discrepancy methods, yields points in the abstract space. These points are concretized and the NN is evaluated on them to ascertain if they are correctly or wrongly classified. The misclassifications are fed back for system-level analysis.

the road (along x-axis), or (iii) the brightness of the image is modified. These modifications constitute the 3 axes of the figure. Our approach finds misclassifications that do not lead to system-level property violations and also misclassifications that do lead to such violations. For example, Fig. 4 shows two misclassified images, one with an environment vehicle that is too far away to be a safety hazard, as well as another image showing an environment vehicle driving slightly on the wrong side of the road, which is close enough to potentially cause a violation of the system-level safety property (of maintaining a safe distance from the ego vehicle).

For further details about this and other results with our approach, we refer the reader to [9,10].

#### **4.2 Semantic Training**

In this section we discuss two ideas for *semantic training and retraining* of deep neural networks. We first discuss the use of *hinge loss* as a way of incorporating confidence levels into the training process. Next, we discuss how system-level counterexamples and associated misclassifications can be used in the retraining process to both improve the accuracy of ML models and also to gain more assurance in the overall system containing the ML component. A more detailed study

**Fig. 4.** Misclassified images for Inception-v3 neural network (trained on ImageNet with TensorFlow). Red crosses are misclassified images and green circles are correctly classified. Our system-level analysis finds a corner-case image that could lead to a system-level safety violation. (Color figure online)

of using misclassifications (ML component-level counterexamples) to improve the accuracy of the neural network is presented in [11]; this approach is termed *counterexample-guided data augmentation*, inspired by counterexample-guided abstraction refinement (CEGAR) [7] and similar paradigms.

**Experimental Setup.** As in the preceding section, we consider an Automatic Emergency Braking System (AEBS) using a DNN-based object detector. However, in these experiments we use an AEBS deployed within Udacity's self-driving car simulator, as reported in our previous work [10].<sup>4</sup> We modified the Udacity simulator to focus exclusively on braking. In our case studies, the car follows some predefined way-points, while accelerating and braking are controlled by the AEBS connected to a convolutional neural network (CNN). In particular, whenever the CNN detects an obstacle in the images provided by the onboard camera, the AEBS triggers a braking action that slows the vehicle down and avoids the collision against the obstacle.

We designed and implemented a CNN to predict the presence of a cow on the road. Given an image taken by the onboard camera, the CNN classifies the picture in either "cow" or "not cow" category. The CNN architecture is shown in Fig. 5. It consists of eight layers: the first six are alternations of convolutions and max-pools with ReLU activations, the last two are a fully connected layer and a softmax that outputs the network prediction (confidence level for each label).

We generated a data set of 1000 road images with and without cows. We split the data set into 80% training and 20% validation data. Our model was implemented and trained using the Tensorflow library with cross-entropy cost function and the Adam algorithm optimizer (learning rate 10−<sup>4</sup>). The model

<sup>4</sup> Udacity's self-driving car simulator: https://github.com/udacity/self-driving-carsim.

**Fig. 5.** CNN architecture.

**Fig. 6.** Udacity simulator with a CNN-based AEBS in action.

reached 95% accuracy on the test set. Finally, the resulting CNN is connected to the Unity simulator via Socket.IO protocol.<sup>5</sup> Figure 6 depicts a screenshot of the simulator with the AEBS in action in proximity of a cow.

**Hinge Loss.** In this section, we investigate the relationship between multiclass hinge loss functions and adversarial examples. *Hinge loss* is defined as follows:

$$l(\hat{y}) = \max(0, k + \max\_{i \neq l}(\hat{y}\_i) - \hat{y}\_l) \tag{6}$$

where (x, y) is a training sample, ˆy <sup>=</sup> F(x) is a prediction, and l is the *ground truth* label of x. For this section, the output ˆy is a numerical value indicating the *confidence level* of the network for each class. For example, ˆy can be the output of a softmax layer as described in Sect. 2.

<sup>5</sup> Socket.IO protocol: https://github.com/socketio.

Consider what happens as we vary <sup>k</sup>. Suppose there is an <sup>i</sup> <sup>=</sup> <sup>l</sup> s.t. ˆyi <sup>&</sup>gt; <sup>y</sup>ˆl. Pick the largest such i, call it i <sup>∗</sup>. For <sup>k</sup> = 0, we will incur a loss of ˆyi<sup>∗</sup> <sup>−</sup>yˆl for the example (x, y). However, as we make k more negative, we increase the tolerance for "misclassifications" produced by the DNN F. Specifically, we incur no penalty for a misclassification as long as the associated confidence level deviates from that of the ground truth label by no more than <sup>|</sup>k|. Larger the absolute value of k, the greater the tolerance. Intuitively, this biases the training process towards avoiding "high confidence misclassifications".

In this experiment, we investigate the role of k and explore different parameter values. At training time, we want to minimize the mean hinge loss across all training samples. We trained the CNN described above with different values of k and evaluated its precision on both the original test set and a set of counterexamples generated for the original model, i.e., the network trained with cross-entropy loss.

Table <sup>1</sup> reports accuracy and log loss for different values of k on both original and counterexamples test sets (Toriginal and <sup>T</sup>countex, respectively).


**Table 1.** Hinge loss with different *k* values.

Table <sup>1</sup> shows interesting results. We note that a negative k increases the accuracy of the model on counterexamples. In other words, biasing the training process by penalizing high-confidence misclassifications improves accuracy on counterexamples! However, the price to pay is a reduction of accuracy on the original test set. This is still a very preliminary result and further experimentation and analysis is necessary.

**System-Level Counterexamples.** By using the composition falsification framework presented in Sect. 4.1, we identify orientations, displacements on the x-axis, and color of an obstacle that leads to a collision of the vehicle with the obstacle. Figure 7 depicts configurations of the obstacle that lead to specification violations, and hence, to collisions.

In an experiment, we augment the original training set with the elements of <sup>T</sup>countex, i.e., images of the original test set <sup>T</sup>original that are misclassified by the original model (see Sect. 4.2).

We trained the model with both cross-entropy and hinge loss for 20 epochs. Both models achieve a high accuracy on the validation set (≈92%). However,

**Fig. 7.** Semantic counterexamples: obstacle configurations leading to property violations (in red). (Color figure online)

when plugged into the AEBS, neither of these models prevents the vehicle from colliding against the obstacle with an adversarial configuration. This seems to indicate that simply retraining with some semantic (system-level) counterexamples generated by analyzing the system containing the ML model may not be sufficient to eliminate all semantic counterexamples.

Interestingly, though, it appears that in both cases the impact of the vehicle with the obstacle happens at a slower speed than the one with the original model. In other words, the AEBS system starts detecting the obstacle earlier than with the original model, and therefore starts braking earlier as well. This means that despite the specification violations, the counterexample retraining procedure seems to help with limiting the damage in case of a collision. Coupled with a run-time assurance framework (see [41]), semantic retraining could help mitigate the impact of misclassifications on the system-level behavior.

#### **5 Conclusion**

In this paper, we surveyed the field of adversarial machine learning with a special focus on deep learning and on test-time attacks. We then introduced the idea of *semantic adversarial machine (deep) learning*, where adversarial analysis and training of ML models is performed using the semantics and context of the overall system within which the ML models are utilized. We identified several ideas for integrating semantics into adversarial learning, including using a semantic modification space, system-level formal specifications, training using semantic counterexamples, and utilizing more detailed information about the outputs produced by the ML model, including confidence levels, in the modules that use these outputs to make decisions. Preliminary experiments show the promise of these ideas, but also indicate that much remains to be done. We believe the field of semantic adversarial learning will be a rich domain for research at the intersection of machine learning, formal methods, and related areas.

**Acknowledgments.** The first and third author were supported in part by NSF grant 1646208, the DARPA BRASS program under agreement number FA8750-16-C0043, the DARPA Assured Autonomy program, and Berkeley Deep Drive.

#### **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

### **From Programs to Interpretable Deep Models and Back**

Eran Yahav(B)

Technion, Haifa, Israel yahave@cs.technion.ac.il

**Abstract.** We demonstrate how deep learning over programs is used to provide (preliminary) augmented programmer intelligence. In the first part, we show how to tackle tasks like code completion, code summarization, and captioning. We describe a general path-based representation of source code that can be used across programming languages and learning tasks, and discuss how this representation enables different learning algorithms. In the second part, we describe techniques for extracting interpretable representations from deep models, shedding light on what has actually been learned in various tasks.

#### **1 Introduction**

We describe a journey from programs to interpretable deep models, and back. First, we show how to apply neural networks to learn interesting facts about programs, and build (interpretable) models for several programming-related tasks. Then, we show how to extract finite-state automata from a given recurrent neural network, providing some insight on what a network has actually learned.

#### **1.1 Motivating Tasks**

**Semantic Labeling of Code Snippets.** Consider the code snippet of Figure 1. This snippet only contains low-level assignments to arrays, but a human reading the code may (correctly) label it as performing the *reverse* operation. Our goal is to be able to predict such labels automatically. The right hand side of Fig. 1 shows the labels predicted automatically using our approach. The most likely prediction (77.34%) is *reverseArray*. Alon et al. [3] provide additional examples.

Intuitively, this problem is hard because it requires *learning a correspondence* between the *entire content of a code snippet* and a semantic label. That is, it requires aggregating possibly hundreds of expressions and statements from the snippet into a single, descriptive label.

E. Yahav—Joint work with Uri Alon, Yoav Goldberg, Omer Levy, Gail Weiss, and Meital Zilberstein.

c The Author(s) 2018

H. Chockler and G. Weissenbacher (Eds.): CAV 2018, LNCS 10981, pp. 27–37, 2018. https://doi.org/10.1007/978-3-319-96145-3\_2

**Fig. 1.** A code snippet and its predicted labels as computed by our model.

**Fig. 2.** A code snippet and its predicted caption as computed by our model.

**Captioning Code Snippets.** Consider the short code snippet of Fig. 2. The goal of *code captioning* is to assign a natural language caption that captures the task performed by the snippet. For the example of Fig. 2 our approach automatically predicts the caption *"get the text of a pdf file in C#"*. Intuitively, this task is harder than semantic labeling, as it requires the generation of a natural language sentence in addition to capturing (something about) the meaning of the code snippet.

**Fig. 3.** A code snippet and its predicted completion as computed by our model.

**Code Completion.** Consider the code of Fig. 3. Our code completion automatically predicts the next steps in the code: ok.newCall(request).execute(). This task requires prediction of the missing part of the code based on a given context. Technically, this can be expressed as predicting a completion of a partial abstract syntax tree.

In the next section, we show how techniques based on neural networks address all of these tasks, as well as other programming-related tasks.

#### **2 From Programs to Deep Models**

#### **2.1 Representation**

Leveraging machine learning models for predicting program properties such as variable names, method names, and expression types is a topic of much recent interest [1,2,6,8,9]. These techniques are based on learning a statistical model from a large amount of code and using the model to make predictions in new programs. A major challenge in these techniques is how to represent instances of the input space to facilitate learning [10]. Designing a program representation that enables effective learning is a critical task that is *often done manually for each task and programming language*.

*Our Approach.* We present a program representation for learning from programs. Our approach uses different *path-based abstractions of the program's abstract syntax tree*. This family of path-based representations is natural, general, fully automatic, and works well across different tasks and programming languages.

**Fig. 4.** A JavaScript program and its AST, along with an example of one of the paths.

*AST Paths.* We define AST paths as paths between nodes in a program's abstract syntax tree (AST). To automatically generate paths, we first parse the program to produce an AST, and then extract paths between nodes in the tree. We represent a path in the AST as a sequence of nodes connected by up and down movements, and represent a program element as the set of paths that its occurrences participate in. Figure 4a shows an example JavaScript program. Figure 4b shows its AST, and one of the extracted paths. The path from the first occurrence of the variable d to its second occurrence can be represented as:

SymbolRef ↑ UnaryPrefix! ↑ While ↓ If ↓ Assign= ↓ SymbolRef

This is an example of a pairwise path between leaves in the AST, but in general the family of path-based representations contains n-wise paths, which do not necessarily span between leaves and do not necessarily contain all the nodes in between. We consider several choices of subsets of this family in [4].

Using a path-based representation has several major advantages:


#### **2.2 Code2vec: Learning Code Embeddings**

In [3], we present a framework for predicting program properties using neural networks. The main idea is a neural network that learns *code embeddings* - continuous distributed vector representations for code. The code embeddings allow us to model correspondence between code snippet and labels in a natural and effective manner. By learning code embeddings, our long term goal is to enable the application of neural techniques to a wide-range of programming-languages tasks. A live demo of the framework is available at https://code2vec.org.

Our neural network architecture uses a representation of code snippets that *leverages the structured nature of source code*, and learns to aggregate multiple syntactic paths into a single vector. This ability is fundamental for the application of deep learning in programming languages. By analogy, word embeddings in natural language processing (NLP) started a revolution of application of deep learning for NLP tasks.

The input to our model is a code snippet and a corresponding tag, label, caption, or name. This tag expresses the semantic property that we wish the network to model, for example: a tag, name that should be assigned to the snippet, or the name of the method, class, or project that the snippet was taken from. Let C be the code snippet and L be the corresponding label or tag. Our underlying hypothesis is that *the distribution of labels can be inferred from syntactic paths in* C. Our model therefore attempts to learn the tag distribution, conditioned on the code: P (L|C).

*Model.* For the full details of the model, see [3]. At a high-level, the key point is that a code snippet is composed of a bag of contexts, and each context is represented by a vector that its values are learned. The values of this vector capture two distinct goals: (i) the semantic meaning of this context, and (ii) the amount of attention this context should get.

The problem is as follows: given an arbitrarily large number of context vectors, we need to aggregate them into a single vector. Two trivial approaches would be to learn the most important one of them, or to use them all by vectoraveraging them. These alternatives are shown to yield poor results (see [3]).

Our main observation is that *all* context vectors need to be used, but the model should learn how much focus to give each vector. This is done by learning how to average context vectors in a weighted manner. The weighted average is obtained by weighting each vector by its dot product with another global attention vector. The vector of each context and the attention vector are trained and learned *simultaneously*, using the standard neural approach of backpropagation.

*Interpreting Attention.* Despite the "black-box" reputation of neural networks, our model is partially interpretable thanks to the attention mechanism, which allows us to visualize the distribution of weights over the bag of pathcontexts. Figures 5 and 6 illustrates a few predictions, along with the pathcontexts that were given the most attention in each method. The width of each of the visualized paths is proportional to the attention weight that it was allocated. We note that in these figures the path is represented only as a connecting line between tokens, while in fact it contains rich syntactic information which is not expressed properly in the figures.

**Fig. 5.** Predictions and attention paths for the program of Fig. 1. The width of a path is proportional to its attention.

**Fig. 6.** Example predictions from our model. The width of a path is proportional to its attention.

The examples of Figs. 5 and 6 are interesting since the top names are accurate and descriptive (reverseArray and reverse; isPrime; sort and bubbleSort) but do not appear explicitly in the code snippets. The code snippets, and specifically the most attended path-contexts describe lower-level operations. Suggesting a descriptive name for each of these methods is difficult and might take time even for a trained human programmer.

#### **2.3 Code2seq: Generating Sequences from Structured Representations of Code**

In contrast to classical (and widespread) seq2seq models for translation, we introduce a new model that performs encoding over source code, and decoding to natural language.

Following [3,4], we introduce an approach for encoding source code that leverages the unique syntactic structure of programming languages. We represent a given code snippet as a set of paths over its abstract syntax tree (AST), where each path is compressed to a fixed-length vector. During decoding, code2seq attends over a different weighted sum of the path-vectors to produce each output token, much like NMT models attend over contextualized token representations in the source sentence. A live demo of the framework is available at https:// code2seq.org.

#### **3 From Deep Models to Automata**

In this section, we focus on extraction of finite-state automata from recurrent neural networks (RNNs). In recent years, there has been significant interest in the use of recurrent neural networks (RNNs), for learning languages. Like other supervised machine learning techniques, RNNs are trained based on a large set of examples of the target concept. While neural networks can reasonably approximate a variety of languages, and even precisely represent a regular language [5], they are in practice unlikely to generalize exactly to the concept being trained, and *what they eventually learn in actuality is unclear* [7]. Our goal in this work is to provide some insight into what a given trained network has actually learned, without requiring changes to the network architecture, or access to the original training data.

*Recurrent Neural Networks.* Recurrent neural networks (RNNs) are a class of neural networks which are used to process sequences of arbitrary lengths. When operating over sequences of discrete alphabets, the input sequence is fed into the RNN on a symbol-by-symbol basis. For each input symbol the RNN outputs a *state vector* representing the sequence up to that point. A state vector and an input symbol are combined for producing the next state vector. The RNN is essentially a parameterized mathematical function that takes as input a state vector and an input vector, and produces a new state vector. The state vectors can be passed to a classification component that is used to produce a binary or multi-class classification decision. The RNN is trainable, and, when trained together with the classification component, the training procedure drives the state vectors to provide a representation of the prefix which is informative for the classification task being trained. We call a combination of an RNN and a classification component an *RNN-acceptor*.

A trained RNN-acceptor can be seen as a state machine in which the states are high-dimensional vectors: it has an initial state, a well defined transition function between internal states, and a well defined classification for each internal state.

*Problem Definition.* Given an RNN-acceptor R trained to accept or reject sequences over an alphabet Σ, our goal is to extract a deterministic finite-state automaton (DFA) A that mimics the behavior of R. That is, our goal is to extract a DFA A such that the language L ⊆ Σ<sup>∗</sup> of sequences accepted by A is observably equivalent to that accepted by R. Intuitively, we would like to obtain a DFA that accepts *exactly* the same language as the network, but this is generally practically impossible as we do not know in advance any bound on the maximum sample length necessary in order to observe all of its behavior.

*Extraction Using Queries and Counterexamples.* In [11], we present a framework for extracting a finite state automaton from a given RNN. The main idea is to use the L<sup>∗</sup> learning algorithm to learn an automaton while using the RNN as the teacher.

**Fig. 7.** Two DFAs resembling, but not perfectly, the correct DFA for the regular language of tokenised JSON lists, (\[\])|(\[[S0NTF](*,* [S0NTF])∗ \])\$. DFA (a) is almost correct, but accepts also list-like sequences in which the last item is missing, i.e. there is a comma followed by a closing bracket. DFA (b) is returned by L<sup>∗</sup> after the teacher (network) rejects (a), but is also not a correct representation of the target language treating the sequence [, as a legitimate list item equivalent to the characters S, 0, N, T, F.

#### **3.1 What Has a Network Learned?**

*Tokenized JSON Lists.* We trained a GRU network with 2 layers and hidden size 100 on the regular language representing a simple tokenized JSON list with no nesting,

(\[\])|(\[[S0NTF](, [S0NTF]) ∗ \])\$

over the 8-letter alphabet {[, ], S, 0, N, T, F, ,}, to accuracy 100% on a training set of size 20000 and a test set of size 2000, both evenly split between positive and negative examples. As before, we extracted from this network using our method.

Within 2 counterexamples (1 provided and 1 generated), our method extracted the automaton shown in Fig. 7a, which is almost but not quite representative of the target language. A few seconds later it returned a counterexample to this DFA which pushed L<sup>∗</sup> to refine further and return the DFA shown in Fig. 7b, which is also almost but not quite representative of zero-nesting tokenized JSON lists.

Ultimately after 400 s, our method extracted (but did not reach equivalence on) an automaton of size 441, returning the counterexamples listed in Table 1 and achieving 100% accuracy against the network on both its train set and all

**Table 1.** Counterexamples returned to the equivalence queries made by L<sup>∗</sup> during extraction of a DFA from a network trained to 100% accuracy on both train and test sets on the regular language (\[\])|(\[[S0NTF](*,* [S0NTF])∗ \])\$ over the 8-letter alphabet {[, ], <sup>S</sup>, <sup>0</sup>, <sup>N</sup>, <sup>T</sup>, <sup>F</sup>, ,}. Counterexamples highlighting the discrepancies between the network behaviour and the target behaviour are shown in bold.


sampled sequence lengths. As before, we note that each state split by the method is justified by concrete inputs to the network, and so the extraction of a large DFA is a sign of the inherent complexity of the learned network behavior.

#### **3.2 Counterexamples**

For many RNN-acceptors that train to 100% accuracy and exhibit perfect test set behavior on large test sets, our method was able to find many simple examples which the network misclassifies.

For instance, for a network trained to classify simple email addresses over the 38-letter alphabet {a,b, ...,z,0,1, ...,9,@,.} as defined by the regular expression

$$\frac{\text{[a-z]}\,\text{[a-z0-9]}\*\,\text{@[a-z0-9]}+.(\text{com}|\text{net}|\text{co.[a-z]}[\text{a-z]})\\$\text{}$$

with 100% accuracy on a 40,000 sample train set and 100% accuracy on a 2,000 sample test set (i.e., a seemingly perfect network), the refinement-based L<sup>∗</sup> extraction quickly returned several counterexamples, showing words that the network classifies incorrectly (e.g., the network accepted the non-email sequence 25.net). While we could not extract a representative DFA from the network in the allotted time frame, our method did show that the network learned a far more elaborate (and incorrect) function than needed.

Beyond demonstrating the counterexample generation capabilities of our extraction method, these results also highlight the brittleness in generalization of trained RNN networks, and suggests that evidence based on test-set performance should be taken with extreme caution.

#### **4 Conclusion**

We provide a brief description of a journey from programs to (somewhat) interpretable deep models that work well across different tasks and different programming languages. As we gained experience with these models, the question of *what have they actually learned* became more important (and subtle). Attention over AST paths provides some insight on what drives the predictions performed by (some of) the models, but a different approach is required for RNN-based models. This motivated the second part of our journey, trying to extract an interpretable model from a given RNN acceptor. This also motivated future work on classifying what can and cannot be learned by different kinds of RNNs [12].

#### **References**

1. Allamanis, M., Barr, E.T., Bird, C., Sutton, C.: Suggesting accurate method and class names. In: Proceedings of the 2015 10th Joint Meeting on Foundations of Software Engineering, ESEC/FSE 2015, pp. 38–49. ACM, New York (2015). http:// doi.acm.org/10.1145/2786805.2786849


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

### **Formal Reasoning About the Security of Amazon Web Services**

Byron Cook1,2(B)

<sup>1</sup> Amazon Web Services, Seattle, USA byron@amazon.com <sup>2</sup> University College London, London, UK

**Abstract.** We report on the development and use of formal verification tools within Amazon Web Services (AWS) to increase the security assurance of its cloud infrastructure and to help customers secure themselves. We also discuss some remaining challenges that could inspire future research in the community.

#### **1 Introduction**

Amazon Web Services (AWS) is a provider of *cloud services*, meaning on-demand access to IT resources via the Internet. AWS adoption is widespread, with over a million active customers in 190 countries, and \$5.1 billion in revenue during the last quarter of 2017. Adoption is also rapidly growing, with revenue regularly increasing between 40–45% year-over-year.

The challenge for AWS in the coming years will be to accelerate the development of its functionality while simultaneously increasing the level of security offered to customers. In 2011, AWS released over 80 significant services and features. In 2012, the number was nearly 160; in 2013, 280; in 2014, 516; in 2015, 722; in 2016, 1,017. Last year the number was 1,430. At the same time, AWS is increasingly being used for a broad range of security-critical computational workloads.

Formal automated reasoning is one of the investments that AWS is making in order to facilitate continued simultaneous growth in both functionality and security. The goal of this paper is to convey information to the formal verification research community about this industrial application of the community's results. Toward that goal we describe work within AWS that uses formal verification to raise the level of security assurance of its products. We also discuss the use of formal reasoning tools by externally-facing products that help customers secure themselves. We close with a discussion about areas where we see that future research could contribute further impact.

*Related Work.* In this work we discuss efforts to make formal verification applicable to use-cases related to cloud security at AWS. For information on previous work within AWS to show functional correctness of some key distributed algorithms, see [43]. Other providers of cloud services also use formal verification to establish security properties, *e.g.* [23,34].

c The Author(s) 2018

H. Chockler and G. Weissenbacher (Eds.): CAV 2018, LNCS 10981, pp. 38–47, 2018. https://doi.org/10.1007/978-3-319-96145-3\_3

Our overall strategy on the application of formal verification has been heavily influenced by the success of previous applied formal verification teams in industrial settings that worked as closely with domain experts as possible, *e.g.* work at Intel [33,50], NASA [31,42], Rockwell Collins [25], the Static Driver Verifier project [20], Facebook [45], and the success of Prover AB in the domain of railway switching [11].

External tools that we use include Boogie [1], Coq [4], CBMC [2], CVC4 [5], Dafny [6], HOL-light [8], Infer [9], OpenJML [10], SAW [13], SMACK [14], Souffle [37], TLA+ [15], VCC [16], and Z3 [17]. We have also collaborated with many organizations and individuals, *e.g.* Galois, Trail of Bits, the University of Sydney, and the University of Waterloo. Finally, many PhD student interns have applied their prototype tools to our problems during their internships.

#### **2 Security of the Cloud**

Amazon and AWS aim to innovate quickly while simultaneously improving on security. An original tenet from the founding of the AWS security team is to never be the organization that says *"no"*, but instead to be the organization that answers difficult security challenges with *"here's how"*. Toward this goal, the AWS security team works closely with product service teams to quickly identify and mitigate potential security concerns as early as possible while simultaneously not slowing the development teams down with bureaucracy. The security team also works with service teams early to facilitate the certification of compliance with industry standards.

The AWS security team performs formal security reviews of all features/services, *e.g.* 1,430 services/features in 2017, a 41% year-over-year increase from 2016. Mitigations to security risks that are developed during these security reviews are documented as a part of the security review process. Another important activity within AWS is ensuring that the cloud infrastructure *stays* secure after launch, especially as the system is modified incrementally by developers.

**Where Formal Reasoning Fits In.** The application security review process used within AWS increasingly involves the use of deductive theorem proving and/or symbolic model checking to establish important temporal properties of the software. For example, in 2017 alone the security team used deductive theorem provers or model checking tools to reason about cryptographic protocols/systems (*e.g.* [24]), hypervisors, boot-loaders/BIOS/firmware (*e.g.* [27]), garbage collectors, and network designs. Overall, formal verification engagements within the AWS security team increased 76% year-over-year in 2017, and found 45% more pre-launch security findings year-over-year in 2017.

To support our needs we have modified a number of open-source projects and contributed those changes back. For example, changes to CBMC [2] facilitate its application to C-based systems at the bottom of the compute stack used in AWS data centers [27]. Changes to SAW [13] add support for the Java programming language. Contributions to SMACK [14] implement automata-theoretic constructions that facilitate automatic proofs that s2n [12] correctly implements the *code balancing* mitigation for side-channel timing attacks. Source-code contributions to OpenJML [10] add support for Java 8 features needed to prove the correctness of code implementing a secure streaming protocol used throughout AWS.

In many cases we use formal verification tools *continuously* to ensure that security is implemented as designed, *e.g.* [24]. In this scenario, whenever changes and updates to the service/feature are developed, the verification tool is reexecuted automatically prior to the deployment of the new version.

The security operations team also uses automated formal reasoning tools in its effort to identify security vulnerabilities found in internal systems and determine their potential impact on demand. For example, an SMT-based semanticlevel policy reasoning tool is used to find misconfigured resource policies.

In general we have found that the internal use of formal reasoning tools provides good value for the investment made. Formal reasoning provides higher levels of assurance than testing for the properties established, as it provides clear information about what has and has not been secured. Furthermore, formal verification of systems can begin long before code is written, as we can prove the correctness of the high-level algorithms and protocols, and use under-constrained symbolic models for unwritten code or hardware that has not been fabricated yet.

#### **3 Securing Customers** *in* **the Cloud**

AWS offers a set of cloud-based services designed to help customers be secure *in* the cloud. Some examples include AWS Config, which provides customers with information about the configurations of their AWS resources; Amazon Inspector, which provides automated security assessments of customer-authored AWSbased applications; Amazon GuardDuty, which monitors AWS accounts looking for unusual account usage on behalf of customers; Amazon Macie, which helps customers discover and classify sensitive data at risk of being leaked; and AWS Trusted Advisor, which automatically makes optimization and security recommendations to customers.

In addition to automatic cloud-based security services, AWS provides people to help customers: *Solutions Architects* from different disciplines work with customers to ensure that they are making the best use of available AWS services; *Technical Account Managers* are assigned to customers and work with them when security or operational events arise; the *Professional Services* team can be hired by customers to work on bespoke cloud-based solutions.

**Where Formal Reasoning Fits In.** Automated formal reasoning tools today provide functionality to customers through the AWS services Config, Inspector, GuardDuty, Macie, Trusted Advisor, and the storage service S3. As an example, customers using the S3 web-based console receiving alerts—via SMT-based reasoning—when their S3 bucket policies are possibly misconfigured. AWS Macie uses the same engine to find possible data exfiltration routes. Another application is the use of high-performance datalog constraint solvers (*e.g.* [37]) to reason about questions of reachability in complex virtual networks built using AWS EC2 networking primitives. The theorem proving service behind this functionality regularly receives 10s of millions of calls daily.

In addition to the automated services that use formal techniques, some members of the AWS Solutions Architects, Technical Account Managers and Professional Services teams are applying and/or deploying formal verification directly with customers. In particular, in certain security-sensitive sectors (*e.g.* financial services), the Professional Services organization are working directly with customers to deploy formal reasoning into their AWS environments.

The customer reaction to features based on formal reasoning tools has been overwhelmingly positive, both anecdotally as well as quantitatively. Calls by AWS services to the automated reasoning tools increased by four orders of magnitude in 2017. With the formal verification tools providing the semantic foundation, customers can make stronger universal statements about their policies and networks and be confident that their assumptions are not violated.

#### **4 Challenges**

At AWS we have successfully applied existing or bespoke formal verification tools to both raise the level of security assurance *of* the cloud as well as help customers protect themselves *in* the cloud. We now know that formal verification provides value to applications in cloud security. There are, however, many problems yet to be solved and many applications of formal verification techniques yet to be discovered and/or applied. In the future we are hoping to solve the problems we face in partnership with the formal verification research community. In this section we outline some of those challenges. Note that in many cases existing teams in the research community will already be working on topics related to these problems, too many to cite comprehensively. Our comments are intended to encourage and inspire more work in this space.

**Reasoning About Risk and Feasibility.** A security engineer spends the majority of their time informally reasoning about risk. The same is true for any corporate Chief Information Security Officer (CISO). We (the formal verification community) potentially have a lot to contribute in this space by developing systems that help reason more formally about the consequences of combinations of events and their relationships to bugs found in systems. Furthermore, our community has a lot to offer by bridging between our concept of a counterexample and the security community's notion of a *proof of concept* (PoC), which is a constructive realization of a security finding in order to demonstrate its feasibility. Often security engineers will develop partial PoCs, meaning that they combine reasoning about risk and the finding of constructive witnesses in order to increase their confidence in the importance of a finding. There are valuable results yet to be discovered by our community at the intersection of reasoning about and synthesis of threat models, environment models, risk/probabilities, counterexamples, and PoCs. A few examples of current work on this topic include [18,28,30,44,48].

**Fixes Not Findings.** Industrial users of formal verification technology need to make systems more secure, not merely find security vulnerabilities. This is true both for securing the cloud, as well as helping customers be secure in the cloud. If there are security findings, the primary objective is to find them *and* fix them quickly. In practice a lot of work is ahead for an organization once a security finding has been identified. As a community, anything we can do to reduce the friction for users trying to triage and fix vulnerabilities, the better. Tools that report false findings are quickly ignored by developers, thus as a community we should focus on improving the fidelity of our tools. Counterexamples can be downplayed by optimistic developers: any assistance in helping users understand the bugs found and/or their consequences is helpful. Security vulnerabilities that require fixes that are hard to build or hard to deploy are an especially important challenge: our community has a lot to offer here via the development of more powerful synthesis/repair methods (*e.g.* [22,32,39]) that take into account threat models, environment models, probabilities, counterexamples.

**Auditable Proof Artifacts for Compliance.** Proof is actually two activities: *searching* for a candidate proof, and *checking* the candidate proof's validity. The searching is the art form, often involving a combination of heuristics that attempt to work around the undecidable. The checking of a proof is (in principle) the boring yet rigorous part, usually decidable, often linear in the size of the proof. Proof artifacts that can be re-checked have value, especially in applications related to compliance certification, *e.g.* DO-333 [26], CENENLEC EN 50128 SIL 4 [11], EAL7 MILS [51]. Non-trivial parts of the various compliance and conformance standards can be checked via mechanical proof, *e.g.* parts of PCI and FIPS 140. Found proofs of compliance controls that can be shared and checked/re-checked have the possibility to reduce the cost of compliance certification, as well as reduce the time-to-market for organizations who require certification before using systems.

**Tracking Casual or Unrealistic Assumptions.** Practical formal verification efforts often make unrealistic assumptions that are later forgotten. As an example, most tools assume that the systems we are analyzing are immune to *single-event upsets*, *e.g.* ionizing particles striking the microprocessor or semiconductor memory. We sometimes assume compilers and runtime garbage collectors are correct. In some cases (*e.g.* [20]) the environment models used by formal verification tools do not capture all possible real-world scenarios. As formal verification tools become more powerful and useful we will increasingly need to reason about what has been proved and what has not been proved, in order to avoid misunderstandings that could lead to security vulnerabilities. In applications of security this reasoning about assumptions made will need to interact with the treatment of risk and how risk is modified by various mitigations, *e.g.* some mitigations for single-event upsets make the events so unlikely they they are not a viable security risk, but still not impossible. This topic has been the focus of some attention over the years, *e.g.* CLINC stack [41], CompCert [3], and DeepSpec [7]. We believe that this will become an increasingly important problem in the future.

**Distributed Formal Verification in the Cloud.** Formal verification tools do not take enough advantage of modern data centers via distributing coordinated processes. Some examples of work in the right direction include [21,35,36,38,40, 47]. Especially in the area of program verification and analysis, our community still focuses on procedures that work on single computers, or perhaps *portfolio* solvers that try different problem encodings or solvers in parallel. Today large formal verification problems are often decomposed manually, and then solved in parallel. There has not been much research in methods for automatically introducing and managing the reasoning about the decompositions automatically in cloud-based distributed systems. This is in part perhaps due to the rules at various annual competitions such as SV-COMP, SMT-COMP, and CASC. We encourage the participants and organizers of competitions to move to cloudbased competitions where solvers have the freedom to use cloud-scale distributed computing to solve formal verification problems. Tool developers could build AMIs or CloudFormation templates that allow cloud distribution. Perhaps future contestants might even make Internet endpoints available with APIs supporting SMTLIB or TPTP such that the competition is simply a series of remote API calls to each competitor's implementation. In this case competitors that embrace the full power of the cloud will have an advantage, and we will see dramatic improvements in the computational power of our formal verification tools.

**Continuous Formal Verification.** As discussed previously, we have found that it is important to focus on *continuous verification*: it is not enough to simply prove the correctness of a protocol or system once, what we need is to *continuously* prove the desired property during the lifetime of the system [24]. This matches reports from elsewhere in industry where formal verification is being applied, *e.g.* [45]. An interesting consequence of our focus on continuous formal verification is that the time and effort spent finding an initial proof before a system is deployed is not as expensive as the time spent maintaining the proof later, as the up-front human cost of the pre-launch proof is amortized over the lifetime of the system. It would be especially interesting to see approaches developed that synthesize new proofs of modified code based on existing proofs of unmodified code.

**The Known Problems are Still Problems.** Many of the problems that we face in AWS are well known to the formal verification community. For example, we need better tools for formal reasoning about languages such as Ruby, Python, and Javascript, *e.g.* [29,49]. Proofs about security-oriented properties of many large open source systems remain an open problem, *e.g.* Angular, Linux, OpenJDK, React, NGINX, Xen. Many formal verification tools are hard to use. Many tools are brittle prototypes only developed for the purposes of publication. Better understanding of ISAs and memory models (*e.g.* [19,46]) are also key to prove the correctness of code operating on low-level devices. Practical and scalable methods for proving the correctness of distributed and/or concurrent systems remains an open problem. Improvements to the performance and scalability of formal verification tools are needed to prove the correctness of larger modules without manual decomposition. Abstraction refinement continues to be a problem, as false bugs are expensive to triage in an industrial setting. Buggy (and thus unsound) proof-based tools lose trust in formal verification with the users who are trying to deploy them.

### **5 Conclusion**

In this paper we have discussed how formal verification contributes to the ability of AWS to quickly develop and deploy new features while simultaneously increasing the security of the AWS cloud infrastructure. We also discussed how formal verification techniques contribute to customer-facing AWS services. In this paper we have outlined some challenges we face. We actively seek solutions to these problems and are happy to collaborate with partners in this pursuit. We look forward to more partnerships, more tools, more collaboration, and more sharing of information as we try to bring affordable, efficient and secure computation to all.

### **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

# Tutorials

### **Foundations and Tools for the Static Analysis of Ethereum Smart Contracts**

Ilya Grishchenko(B), Matteo Maffei(B) , and Clara Schneidewind(B)

TU Wien, Vienna, Austria

{ilya.grishchenko,matteo.maffei,clara.schneidewind}@tuwien.ac.at

**Abstract.** The recent growth of the blockchain technology market puts its main cryptocurrencies in the spotlight. Among them, Ethereum stands out due to its virtual machine (EVM) supporting smart contracts, i.e., distributed programs that control the flow of the digital currency Ether. Being written in a Turing complete language, Ethereum smart contracts allow for expressing a broad spectrum of financial applications. The price for this expressiveness, however, is a significant semantic complexity, which increases the risk of programming errors. Recent attacks exploiting bugs in smart contract implementations call for the design of formal verification techniques for smart contracts. This, however, requires rigorous semantic foundations, a formal characterization of the expected security properties, and dedicated abstraction techniques tailored to the specific EVM semantics. This work will overview the state-of-the-art in smart contract verification, covering formal semantics, security definitions, and verification tools. We will then focus on EtherTrust [1], a framework for the static analysis of Ethereum smart contracts which includes the first complete small-step semantics of EVM bytecode, the first formal characterization of a large class of security properties for smart contracts, and the first static analysis for EVM bytecode that comes with a proof of soundness.

#### **1 Introduction**

Blockchain technologies promise secure distributed computations even in absence of trusted third parties. The core of this technology is a distributed ledger that keeps track of previous transactions and the state of each account, and whose functionality and security is ensured by a careful combination of incentives and cryptography. Within this framework, software developers can implement sophisticated distributed, transaction-based computations by leveraging the scripting language offered by the underlying cryptocurrency. While many of these cryptocurrencies have an intentionally limited scripting language (e.g., Bitcoin [2]), Ethereum was designed from the ground up with a quasi Turing-complete language<sup>1</sup>. Ethereum programs, called *smart contracts*, have thus found a variety of

<sup>1</sup> While the language itself is Turing complete, computations are associated with a bounded computational budget (called gas), which gets consumed by each instruction thereby enforcing termination.

c The Author(s) 2018

H. Chockler and G. Weissenbacher (Eds.): CAV 2018, LNCS 10981, pp. 51–78, 2018. https://doi.org/10.1007/978-3-319-96145-3\_4

appealing use cases, such as auctions [3], data management systems [4], financial contracts [5], elections [6], trading platforms [7,8], permission management [9] and verifiable cloud computing [10], just to mention a few. Given their financial nature, bugs and vulnerabilities in smart contracts may lead to catastrophic consequences. For instance, the infamous DAO vulnerability [11] recently led to a 60M\$ financial loss and similar vulnerabilities occur on a regular basis [12,13]. Furthermore, many smart contracts in the wild are intentionally fraudulent, as highlighted in a recent survey [14].

A rigorous security analysis of smart contracts is thus crucial for the trust of the society in blockchain technologies and their widespread deployment. Unfortunately, this task is quite challenging for various reasons. First, Ethereum smart contracts are developed in an ad-hoc language, called Solidity, which resembles JavaScript but features specific transaction-oriented mechanisms and a number of non-standard semantic behaviours, as further described in this paper. Second, smart contracts are uploaded on the blockchain in the form of Ethereum Virtual Machine (EVM) bytecode, a stack-based low-level code featuring dynamic code creation and invocation and, in general, very little static information, which makes it extremely difficult to analyze.

**Our Contributions.** This work overviews the existing approaches taken towards formal verification of Ethereum smart contracts and discusses EtherTrust, the first sound static analysis tool for EVM bytecode. Specifically, our contributions are


**Outline.** The remainder of this paper is organized as follows. Section 2 briefly overviews the Ethereum architecture, Sect. 3 reviews the state of the art in formal verification of Ethereum smart contracts, Sect. 4 revisits the Ethereum small-step semantics introduced by [15], Sect. 5 presents the single-entrancy property for smart contracts as defined by [15], Sect. 6 discusses the key ideas of the first sound static analysis for Ethereum bytecode as implemented in EtherTrust [1], Sect. 7 shows how reachability properties can automatically be checked using EtherTrust, and Sect. 8 concludes summarizing the key points of the paper.

#### **2 Background on Ethereum**

In the following we will shortly overview the mechanics of the cryptocurrency Ethereum and its built-in scripting language EVM bytecode.

#### **2.1 Ethereum**

Ethereum is a cryptographic currency system built on top of a blockchain. Similar to Bitcoin, network participants publish transactions to the network that are then grouped into blocks by distinct nodes (the so called *miners*) and appended to the blockchain using a proof of work (PoW) consensus mechanism. The state of the system – that we will also refer to as *global state* – consists of the state of the different accounts populating it. An account can either be an external account (belonging to a user of the system) that carries information on its current balance or it can be a contract account that additionally obtains persistent storage and the contract's code. The account's balances are given in the subunit *wei* of the virtual currency *Ether*. 2

Transactions can alter the state of the system by either creating new contract accounts or by calling an existing account. Calls to external accounts can only transfer Ether to this account, but calls to contract accounts additionally execute the code associated to the contract. The contract execution might alter the storage of the account or might again perform transactions – in this case we talk about *internal transactions*.

The execution model underlying the execution of contract code is described by a virtual state machine, the *Ethereum Virtual Machine* (EVM). This is *quasi Turing complete* as the otherwise Turing complete execution is restricted by the upfront defined resource *gas* that effectively limits the number of execution steps. The originator of the transaction can specify the maximal gas that should be spent for the contract execution and also determines the gas price (the amount of wei to pay for a unit of gas). Upfront, the originator pays for the gas limit according to the gas price and in case of successful contract execution that did not spend the whole amount of gas dedicated to it, the originator gets reimbursed with gas that is left. The remaining wei paid for the used gas are given as a fee to a beneficiary address specified by the miner.

#### **2.2 EVM Bytecode**

Contracts are delivered and executed in *EVM bytecode* format – an Assembler like bytecode language. As the core of the EVM is a stack-based machine, the set of instructions in EVM bytecode consists mainly of standard instructions for stack operations, arithmetics, jumps and local memory access. The classical set of instructions is enriched with an opcode for the SHA3 hash and several opcodes for accessing the environment that the contract was called in. In addition, there are opcodes for accessing and modifying the storage of the account

<sup>2</sup> One Ether is equivalent to 10<sup>18</sup> wei.

currently running the code and distinct opcodes for performing internal call and create transactions. Another instruction particular to the blockchain setting is the SELFDESTRUCT code that deletes the currently executed contract - but only after the successful execution of the external transaction.

The execution of each instruction consumes a positive amount of *gas*. The sender of the transaction specifies a gas limit and exceeding it results in an exception that reverts the effects of the current transaction on the global state. In the case of nested transactions, the occurrence of an exception only reverts its own effects, but not those of the calling transaction. Instead, the failure of an internal transaction is only indicated by writing zero to the caller's stack.

### **3 Overview on Formal Verification Approaches**

In the following we give an overview on the approaches taken so far in the direction of securing (Ethereum) smart contracts. We distinguish between verification approaches and design approaches. According to our terminology, the goal of verification approaches is to check smart contracts written in existing languages (such as Solidity) for their compliance with a security policy or specification. In contrast, design approaches aim at facilitating the creation of secure smart contracts by providing frameworks for their development: These approaches encompass new languages which are more amenable to verification, provide a clear and simple semantics that is understandable by smart contract developers or allow for a direct encoding of desired security policies. In addition, we count works that aim at providing design patterns for secure smart contracts to this category.

#### **3.1 Verification**

In the field of smart contract verification we categorize the existing approaches along the following dimensions: target language (bytecode vs high level language), point of verification (static vs. dynamic analysis methods), provided guarantees (bug-finding vs. formal soundness guarantees), checked properties (generic contract properties vs. contract specific properties), degree of automation (automated verification vs. assisted analysis vs. manual inspection). From the current spectrum of analysis tools, we can find solutions in the following clusters:

**Static Analysis Tools for Automated Bug-Finding.** Oyente [16] is a stateof-the-art static analysis tool for EVM bytecode that relies on symbolic execution. Oyente supports a variety of pre-defined security properties, such as transaction order dependency, time-stamp dependency, and reentrancy that can be checked automatically. However, Oyente is not striving for soundness nor completeness. This is on the one hand due to the simplified semantics that serves as foundation of the analysis [15]. On the other hand, the security properties are rather syntactic or pattern based and are lacking a semantic characterization. Recently, Zhou et al. proposed the static analysis tool SASC [17] that extends Oyente by additional patterns and provides a visualization of detected risks in the topology diagram of the original Solidity code.

Majan [18] extends the approach taken in Oyente to trace properties that consider multiple invocations of one smart contract. As Oyente, it relies on symbolic execution that follows a simplified version of the semantics used in Oyente and uses a pattern-based approach for defining the concrete properties to be checked. The tool covers safety properties (such as prodigality and suicidality) and liveness properties (greediness). As for Oyente, the authors do not make any security claims, but consider their tool a 'bug catching approach'.

**Static Analysis Tools for Automated Verification of Generic Properties.** In contrast to the aforementioned class of tools, this line of research aims at providing formal guarantees for the analysis results.

A recently published work is the static analysis tool ZEUS [19] that analyzes smart contracts written in Solidity using symbolic model checking. The analysis proceeds by translating Solidity code to an abstract intermediate language that again is translated to LLVM bitcode. Finally, existing symbolic model checking tools for LLVM bitcode are leveraged for checking generic security properties. ZEUS consequently only allows for analyzing contracts whose Solidity source code is made available. In addition, the semantics of the intermediate language cannot easily be reconciled with the actual Solidity semantics that is determined by its translation to EVM bytecode. This is as the semantics of the intermediate language by design does not allow for the revocation of the global system state in the case of a failed call – which however is fundamental feature of Ethereum smart contract execution.

Other tools proposed in the realm of automated static analysis for generic properties are Securify [20], Mythril [21] and Manticore [22] (for analysing bytecode) and SmartCheck [23] and Solgraph [24] (for analyzing Solidity code). These tools however are not accompanied by any academic paper so that the concrete analysis goals stay unspecified.

**Frameworks for Semi-automated Proofs for Contract Specific Properties.** Hirai [25] formalizes the EVM semantics in the proof assistant Isabelle/HOL and uses it for manually proving safety properties for concrete contracts. This semantics, however, constitutes a sound over-approximation of the original semantics [26]. Building on top of this work, Amani et al. propose a sound program logic for EVM bytecode based on separation logics [27]. This logic allows for semi-automatically reasoning about correctness properties of EVM bytecode using the proof assistant Isabelle/HOL.

Hildebrandt et al. [28] define the EVM semantics in the K framework [29] – a language independent verification framework based on reachability logics. The authors leverage the power of the K framework in order to automatically derive analysis tools for the specified semantics, presenting as an example a gas analysis tool, a semantic debugger, and a program verifier based on reachability logics. The derived program verifier still requires the user to manually specify loop invariants on the bytecode level.

Bhargavan et al. [30] introduce a framework to analyze Ethereum contracts by translation into F\*, a functional programming language aimed at program verification and equipped with an interactive proof assistant. The translation supports only a fragment of the EVM bytecode and does not come with a justifying semantic argument.

**Dynamic Monitoring for Predefined Security Properties.** Grossman et al. [31] propose the notion of effectively callback free executions and identify the absence of this property in smart contract executions as the source of common bugs such as reentrancy. They propose an efficient online algorithm for discovering executions violating effectively callback freeness. Implementing a corresponding monitor in the EVM would guarantee the absence of the potentially dangerous smart contract executions, but is not compatible with the current Ethereum version and would require a hard fork.

A dynamic monitoring solution compatible with Ethereum is offered by the tool DappGuard [32]. The tool actively monitors the incoming transactions to a smart contract and leverages the tool Oyente [16], an own analysis engine and a simulation of the transaction on the testnet for judging whether the incoming transaction might cause a (generic) security violation (such as transaction order dependency). If a transaction is considered harmful, a counter transaction (killing the contract or performing some other fixes) is made. The authors claim that this transaction will be mined with high probability before the problematic one. Due to this uncertainty and the bug-finding tools used for evaluation of incoming transactions, this approach does not provide any guarantees.

#### **3.2 Design**

The current research on secure smart contract design focuses on the following four areas: high-level programming languages, intermediate languages (for verification), security patterns for existing languages and visual tools for designing smart contracts.

**High-Level Languages.** One line of research on high-level smart contract languages concentrates on the facilitation of secure smart contract design by limiting the language expressiveness and enforcing strong static typing discipline. Simplicity [33] is a typed functional programming language for smart contracts that disallows loops and recursion. It is a general purpose language for smart contracts and not tailored to the Ethereum setting. Simplicity comes with a denotational semantics specified in Coq that allows for reasoning formally about Simplicity contracts. As there is no (verified) compiler to EVM bytecode so far, such results don't carry over to Ethereum smart contracts. In the same realm, Pettersson and Edstr¨om [34], propose a library for the programming language Idris that allows for the development of secure smart contracts using dependent and polymorphic types. They extend the existing Idris compiler with a generator for Serpent code (a Python-like high-level language for Ethereum smart contracts). This compiler is a proof of concept and fails in compiling more advanced contracts (as it cannot handle recursion). In a preliminary work, Coblenz [35] propose Obsidian, an object-oriented programming language that pursues the goal of preventing common bugs in smart contracts such as reentrancy. To this end, Obsidian makes states explicit and uses a linear type system for quantities of money.

Another line of research focuses on designing languages that allow for encoding security policies that are dynamically enforced at runtime. A first step in this direction is sketched in the preliminary work on Flint [36], a type-safe, capabilities-secure, contract-oriented programming language for smart contracts that gets compiled to EVM bytecode. Flint allows for defining caller capabilities restricting the access to security sensitive functions. These capabilities shall be enforced by the EVM bytecode created during compilation. But so far, there is only an extended abstract available.

In addition to these approaches from academia, the Ethereum foundation currently develops the high-level languages Viper [37] and Bamboo [38]. Furthermore, the Solidity compiler used to support a limited export functionality to the intermediate language WhyML [39] allowing for a pre/post condition style reasoning on Solidity code by leveraging the deductive program verification platform Why3 [40].

**Intermediate Languages.** The intermediate language Scilla [41] comes with a semantics formalized in the proof assistant Coq and therefore allows for a mechanized verification of Scilla contracts. In addition, Scilla makes some interesting design choices that might inspire the development of future high level languages for smart contracts: Scilla provides a strict separation not only between computation and communication, but also between pure and effectful computations.

**Security Patterns.** W¨ohrer [42] describes programming patterns in Solidity that should be adapted by smart contract programmers for avoiding common bugs. These patterns encompass best coding practices such as performing calls at the end of a function, but also off-the-self solutions for common security bugs such as locking a contract for avoiding reentrancy or the integration of a mechanism that allows the contract owner to disable sensitive functionalities in the case of a bug.

**Tools.** Mavridou and Laszka [43] introduce a framework for designing smart contracts in terms of finite state machines. They provide a tool with a graphical editor for defining contract specifications as automata and give a translation of the constructed finite state machines to Solidity. In addition, they present some security extensions and patterns that can be used as off-the-shelf solutions for preventing reentrancy and implementing common security challenges such as time constraints and authorization. The approach however is lacking formal foundations as neither the correctness of the translation is proven correct, nor are the security patterns shown to meet the desired security goals.

#### **3.3 Open Challenges**

Even though the previous section highlights the wide range of steps taken towards the analysis of Ethereum smart contracts, there are still a lot of open challenges left.

**Secure Compilation of High-Level Languages.** Even though there are several proposals made for new high-level languages that facilitate the design of secure smart contracts and that are more amenable to verification, none of them comes so far with a verified compiler to EVM bytecode. Such a secure compilation however is the requirement for the results shown on high-level language programs to carry over to the actual smart contracts published on the blockchain.

**Specification Languages for Smart Contracts.** So far, all approaches to verifying contract specific properties focus on either ad-hoc specifications in the used verification framework [25,27,28,30] or the insertion of assertions into existing contract code [39]. For leveraging the power of existing model checking techniques for program verification, the design of a general-purpose contract specification language would be needed.

**Study of Security Policies.** There has been no fundamental research made so far on the classes of security policies that might be interesting to enforce in the setting of smart contracts. In particular, it would be compelling to characterize the class of security policies that can be enforced by smart contracts within the existing EVM.

**Compositional Reasoning About Smart Contracts.** Most research on smart contract verification focuses on reasoning about individual contracts or at most a bunch of contracts whose bytecode is fully available. Even though there has been work observing the similarities between smart contracts and concurrent programs [44], there has been no rigorous study on compositional reasoning for smart contracts so far.

#### **4 Semantics**

Recently, Grishchenko et al. [15] introduced the first complete small-step semantics for EVM bytecode. As this semantics serves as a basis for the static analyzer EtherTrust, we will in the following shortly review the general layout and the most important features of the semantics.

#### **4.1 Execution Configurations**

Before discussing the small-step rules of the semantics, we first introduce the general shape of execution configurations.

**Global State.** The global state of the Ethereum blockchain is represented as a (partial) mapping from account addresses to accounts. In the case that an account does not exist, we assume it to map to ⊥. Accounts are composed of a nonce n that is incremented with every other account that the account creates, a balance b, a persistent unbounded storage *stor* and the account's code. External accounts carry an empty code which makes their storage inaccessible and hence irrelevant.

**Small-Step Relation.** The semantics is formalized by a small-step relation Γ - <sup>S</sup> <sup>→</sup> <sup>S</sup> that specifies how a call stack <sup>S</sup> representing the state of the execution evolves within one step under the transaction environment Γ. We call the pair (Γ, S) a *configuration*.

**Transaction Environments.** The transaction environment represents the static information of the block that the transaction is executed in and the immutable parameters given to the transaction as the gas prize or the gas limit. These parameters can be accessed by distinct bytecode instructions and consequently influence the transaction execution.

**Call Stacks.** A call stack S is a stack of execution states which represents the state of the overall execution of the initial external transaction. The individual execution states of the stack represent the states of the uncompleted internal transactions performed during the execution. Formally, a call stack is a stack of regular execution states of the form (μ, ι, σ) that can optionally be topped with a halting state *HALT*(σ, *gas*, d) or an exception state *EXC*. Semantically, halting states indicate regular halting of an internal transaction, exception states indicate exceptional halting, and regular execution states describe the state of internal transactions in progress. Halting and exception states can only occur as top elements of the call stack as they represent terminated internal transactions. Halting states carry the information affecting the callee state such as the global state σ that the internal execution halted in, the unspent gas *gas* from the internal transaction execution and the return data d.

The state of a non-terminated internal transaction is described by a regular execution state of the form (μ, ι, σ). The state is determined by the current global state σ of the system as well as the execution environment ι that specifies the parameters of the current transaction (including inputs and the code to be executed) and the local state μ of the stack machine.

**Table 1.** Semantic rules for ADD

$$\text{ADD}$$

$$\begin{array}{c} \begin{array}{c} \iota.code \left[\mu,\mathsf{pc}\right] = \mathsf{ADD} \\ \mu' = \iota::b:::s \end{array} \\ \begin{array}{c} \iota.code \left[\mu,\mathsf{gas}\right] = \lambda \\ \Gamma \vdash (\mu,\iota,\sigma)::S \to (\mu',\iota,\sigma)::S \end{array} \end{array} \begin{array}{c} \mathsf{ADD} \\ \mu' = \mu[\mathsf{s}\to(a+b)::s][\mathsf{pc}\to=1][\mathsf{gas}\to=1] \end{array} \end{array}$$

$$\begin{array}{c} \begin{array}{c} \mathsf{ADD}\text{-FAL} \\ \iota.code \left[\mu,\mathsf{pc}\right] = \mathsf{ADD} \\ \Gamma \vdash (\mu,\iota,\sigma)::S \to EX ::S \end{array} \end{array}$$

**Execution Environment.** The execution environment ι of an internal transaction is a tuple of static parameters (*actor*, *input*, *sender*, *value*, *code*) to the transaction that, i.a., determine the code to be executed and the account in whose context the code will be executed. The execution environment incorporates the following components: the active account *actor* that is the account that is currently executing and whose account will be affected when instructions for storage modification or money transfer are performed; the input data *input* given to the transaction; the address sender of the account that initiated the transaction; the amount of wei *value* transferred with the transaction; the code *code* that is executed by the transaction. The execution environment is determined upon initialization of an internal transaction execution, and it can be accessed, but not altered during the execution.

**Machine State.** The local machine state μ represents the state of the underlying stack machine used for execution. Formally it is represented by a tuple (*gas*, *pc*, *m*, *aw*, *s*) holding the amount of gas *gas* available for execution, the program counter *pc*, the local memory *m*, the number of active words in memory *aw*, and the machine stack *s*.

The execution of each internal transaction starts in a fresh machine state, with an empty stack, memory initialized to all zeros, and program counter and active words in memory set to zero. Only the gas is instantiated with the gas value available for the execution. We call execution states with machine states of this form *initial*.

#### **4.2 Small-Step Rules**

In the following, we will present a selection of interesting small-step rules in order to illustrate the most important features of the semantics.

**Local Instructions.** For demonstrating the overall design of the semantics, we start with the example of the arithmetic expression ADD performing addition of two values on the machine stack. The small-step rules for ADD are shown in Table 1. We use a dot notation, in order to access components of the different state parameters. We name the components with the variable names introduced for these components in the last section written in sans-serif-style. In addition, we use the usual notation for updating components: *<sup>t</sup>*[<sup>c</sup> <sup>→</sup> <sup>v</sup>] denotes that the component c of tuple *t* is updated with value v. For expressing incremental updates in a simpler way, we additionally use the notation t[c += v] to denote that the (numerical) component of <sup>c</sup> is incremented by <sup>v</sup> and similarly <sup>t</sup>[<sup>c</sup> <sup>−</sup><sup>=</sup> <sup>v</sup>] for decrementing a component c of t.

The execution of the arithmetic instruction ADD only performs local changes in the machine state affecting the local stack, the program counter, and the gas budget. For deciding upon the correct instruction to execute, the currently executed code (that is part of the execution environment) is accessed at the position of the current program counter. The cost of an ADD instruction consists always of three units of gas that get subtracted from the gas budget in the machine state. As every other instruction, ADD can fail due to lacking gas or due to underflows on the machine stack. In this case, the exception state is entered and the execution of the current internal transaction is terminated. For better readability, we use here the slightly sloppy ∨ notation for combining the two error cases in one inference rule.

**Transaction Initiating Instructions.** A class of instructions with a more involved semantics are those instructions initiating internal transactions. This class incorporates instructions for calling another contract (CALL, CALLCODE and DELEGATECALL) and for creating a new contract (CREATE). We will explain the semantics of those instructions in an intuitive way omitting technical details.

The call instructions initiate a new internal call transaction whose parameters are specified on the machine stack – including the recipient (callee) and the amount of money to be transferred (in the case of CALL and CALLCODE). In addition, the input to the call is specified by providing the corresponding local memory fragment and analogously a memory fragment for the return value.

When executing a call instruction, the specified amount of wei is transferred to the callee and the code of the callee is executed. The different call types diverge in the environment that the callee code is executed in. In the case of a CALL instruction, while executing the callee code (only) the account of the callee can be accessed and modified. So intuitively, the control is completely handed to the callee as its code is executed in its own context. In contrast, in the case of CALLCODE, the executed callee code can (only) access and modify the account of the caller. So the callee's code is executed in the caller's context which might be useful for using library functionalities implemented in a separate library contract that e.g., transfer money on behalf of the caller.

This idea is pushed even further in the DELEGATECALL instruction. This call type does not allow for transferring money and executes the callee's code not only in the caller's context, but even preserves part of the execution environment of the previous call (in particular the call value and the sender information). Intuitively, this instruction resembles adding the callee's code to the caller as

**Fig. 1.** Illustration of the semantics of different call types

an internal function so that calling it does not cause a new internal transaction (even though it formally does).

Figure 1 summarizes the behavior of the different call instructions in EVM bytecode. The executed code of the respective account is highlighted in orange while the accessible account state is depicted in green. The remaining internal transaction information (as specified in the execution environment) on the sender of the internal transaction and the transferred value are marked in violet. In addition, the picture relates the corresponding changes to the small-step semantics: the execution of a call transaction adds a new execution state to the call stack while preserving the old one. The new global state σ records the changes in the accounts' balances, while the new execution environment ι determines the accessible account (by setting the actor of the internal transaction correspondingly), the code to be executed (by setting code) and further accessible transaction information as the sender, value and input (by setting sender, value and input respectively).

**Fig. 2.** Illustration of the semantics of the CREATE instruction (Color figure online)

The CREATE instruction initiates an internal transaction that creates a new account. The semantics of this instruction is similar to the one of CALL, with the exception that a fresh account is created, which gets the specified value transferred, and that the input provided to this internal transaction, which is again specified in the local memory, is interpreted as the initialization code to be executed in order to produce the newly created account's code as output. Figure 2 depicts the semantics of the CREATE instruction in a similar fashion as it is done for the call instructions before. It is notable that the input to the CREATE instruction is interpreted as code and executed (therefore highlighted in orange) in the context of the newly created contract (highlighted in green). During this execution the newly created contract does not have any contract code itself (therefore depicted in gray), but only after completing the internal transaction the return value of the transaction will be set as code for the freshly created contract.

#### **5 Security Properties**

Grishchenko et al. [15] propose generic security definitions for smart contracts that rule out certain classes of potentially harmful contract behavior. These properties constitute trace properties (more precisely, safety properties) as well as hyper properties (in particular, value independence properties). In this work, we revisit one of these safety properties called *single-entrancy* and use this property as a case study for showing how safety properties of smart contracts (that can be over-approximated by pure reachability properties) can be automatically checked by static analysis. For checking value independence properties, in [1] the reviewed analysis technique is extended with a simple dependency analysis that we will not discuss further in this work.

#### **5.1 Preliminary Notations**

Formally, contracts are represented as tuples of the form (a, *code*) where a denotes the address of the contract and *code* denotes the contract's code.

In order to give concise security definitions, we further introduce, and assume all through the paper, an annotation to the small step semantics in order to highlight the contract c that is currently executed. In the case of initialization code being executed, we use <sup>⊥</sup>. We write <sup>S</sup> + +S for the concatenation of call stacks S and S . Finally, for arguing about EVM bytecode executions, we are only interested in those initial configurations that might result from a valid external transaction in a valid block. In the following, we will call these configurations *reachable* and refer to [15] for a detailed definition.

#### **5.2 Single-Entrancy**

For motivating the definition of single-entrancy, we introduce a class of bugs in Ethereum smart contracts called *reentrancy bugs* [14,16].

The most famous representative of this class is the so-called DAO bug that led to a loss of 60 million dollars in June 2016 [11]. In an attack exploiting this bug, the affected contract was drained out of money by subsequently reentering it and performing transactions to the attacker on behalf of the contract.

The cause of such bugs mostly roots in the developer's misunderstanding of the semantics of Solidity's call primitives. In general, calling a contract can invoke two kinds of actions: Transferring Ether to the contract's account or Executing (parts of) a contracts code. In particular, Solidity's call construct (being translated to a CALL instruction in EVM bytecode) invokes the execution of a fraction of the callee's code – specified in the so called *fallback function*. A contract's fallback function is written as a function without names or argument as depicted in the Mallory contract in Fig. 3b.

Consequently, when using the call construct the developer may expect an atomic value transfer where potentially another contract's code is executed. For illustrating how to exploit this sort of bug, we consider the contracts in Fig. 3.

**Fig. 3.** Reentrancy attack

The function ping of contract Bob sends an amount of 2 *wei* to the address specified in the argument. However, this should only be possible once, which is potentially ensured by the sent variable that is set after the successful money transfer. Instead, it turns out that invoking the call.value function on a contract's address invokes the contract's fallback function as well.

Given a second contract Mallory, it is possible to transfer more money than the intended 2 *wei* to the account of Mallory. By invoking Bob's function ping with the address of Mallory's account, 2 *wei* are transferred to Mallory's account and additionally the fallback function of Mallory is invoked. As the fallback function again calls the ping function with Mallory's address another 2 *wei* are transferred before the variable sent of contract Bob was set. This looping goes on until all gas of the initial call is consumed or the callstack limit is reached. In this case, only the last transfer of *wei* is reverted and the effects of all former calls stay in place. Consequently the intended restriction on contract Bob's ping function (namely to only transfer 2 *wei* once) is circumvented.

Motivated by these kinds of attacks, the notion of single-entrancy was introduced. Intuitively, a contract is single-entrant if it cannot perform any more calls once it has been reentered. Formally this property can be expressed in terms of the small-steps semantics as follows:

**Definition 1 (Single-entrancy** [15]**).** *A contract* c *is single-entrant if for all reachable configurations* (Γ, s*<sup>c</sup>* :: S)*, it holds for all* s , s, S *that*

$$\begin{aligned} \Gamma \models s\_c :: S \to^\* s'\_c :: S' ++ s\_c :: S\\ \implies \neg \exists s'' \in \mathcal{S}, c' \in \mathcal{C}\_\bot. \Gamma \models s'\_c :: S' ++ s\_c :: S \to^\* s''\_{c'} :: s'\_c :: S' ++ s\_c :: S \end{aligned}$$

This property constitutes a safety property. We will show in Sect. 7 how it can be appropriately abstracted for being expressed in the EtherTrust analysis framework.

**Fig. 4.** Simplified soundness statement

#### **6 Verification**

Grishchenko et al. [1] developed a static analysis framework for analyzing reachability properties of EVM smart contracts. This framework relies on an abstract semantics for EVM bytecode soundly over-approximating the semantics presented in Sect. 4.

In the following we will review the abstractions performed on the small-step configurations and execution rules using the example of the abstract execution rule for the ADD instruction. Afterwards, we will discuss shortly how call instructions are over-approximated.

#### **6.1 Abstract Semantics**

Figure 4 gives an overview on the relation between the small-step and the abstract semantics. For the analysis, we will consider a particular contract c<sup>∗</sup> under analysis whose code is known. An over-approximation of the behavior of this smart contract will be encoded in *Horn clauses*(Δ). These describe how an abstract configuration (represented by a set of abstract state predicates) evolves within the execution of the contract's instructions. Abstract configurations are obtained by translating small-step configurations to a set Π of facts over state predicates that characterize (an over-approximation of) the original configuration. This transformation is performed with respect to the contract c<sup>∗</sup> as only all local behavior of this particular contract will be over-approximated and consequently only those elements on the callstack representing executions of c<sup>∗</sup> are translated. Finally, we will show that no matter how the contract c<sup>∗</sup> is called (so for every arbitrary reachable configuration Γ, s*<sup>c</sup>*<sup>∗</sup> :: S), every sequence of execution steps that is performed while executing it can be mimicked by a derivation of the abstract configuration Π*<sup>s</sup>* (obtained from translating the execution state s) using the horn clauses Δ (that model the abstract semantics of the contract <sup>c</sup>∗). More precisely, this means that from the set of facts <sup>Π</sup>*<sup>s</sup>* <sup>∪</sup> <sup>Δ</sup> a set <sup>Π</sup> can be derived that is a coarser abstraction (<:) than Π*<sup>S</sup>* which is the translation of the execution's intermediate call stack S . A corresponding formal soundness statement is proven in [1].

#### **6.2 Abstract Configurations**

Table 2 shows the analysis facts used for describing the abstract semantics. These consist of (instances of) state predicates that represent partial abstract configurations. Accordingly, abstract configurations are sets of facts not containing any variables as arguments. We will refer to such facts as *closed facts*. Finally, abstract contracts are characterized as sets of Horn clauses over the state predicates (facts) that describe the state changes induced by the instructions at the different program positions. Here only those state predicates are depicted that are needed for describing the abstract semantics of the ADD instruction.

The state predicates are parametrized by a program point *pp* that is a tuple of the form (*id*∗, *pc*) with *id*<sup>∗</sup> being a contract identifier for contract c<sup>∗</sup> and *pc* being the program counter at which the abstract state holds.<sup>3</sup> The parametrization by the contract identifier helps to make the analysis consider a set of contracts whose code is known (such as e.g., library code that is known to be used by the contract). In this work however we focus on the case where c<sup>∗</sup> represented by identifier *id*<sup>∗</sup> is the only known contract. In addition, the predicates carry the relative call depth *cd* as argument. The relative call depth is the size of the call stack built up on the execution of c<sup>∗</sup> (Cf. call stack S in Fig. 4) and serves as abstraction for the (relative) call stack that contract c<sup>∗</sup> is currently executed on.

<sup>3</sup> Making the program counter a parameter instead of an argument is a design choice made in order to minimize the number of recursive horn clauses simplifying automated verification.

**Table 2.** Analysis Facts. All arguments in the analysis facts marked with a hat (ˆ·) range over *<sup>D</sup>*<sup>ˆ</sup> <sup>∪</sup> *Vars* where *<sup>D</sup>*<sup>ˆ</sup> is the abstract domain and *Vars* is the set of variables. All other arguments of analysis facts range over N with exception of *sa* that ranges over (<sup>N</sup> <sup>→</sup> *<sup>D</sup>*ˆ) <sup>∪</sup> *Vars*. Closed facts *cf* are assumed to be facts with arguments not coming from *Vars*.

The relative call depth helps to distinguish different recursive executions of c<sup>∗</sup> and thereby improves the precision of the analysis.

As the ADD instruction only operates on the local machine state, we focus on the abstract representation of the machine state μ: The state predicates representing μ are MState*pp* and Mem*pp*. The fact MState*pp* ((*size*, *sa*), *aw*ˆ , *gas*ˆ , *cd*) says that at program point *pp* and relative call depth *cd* the machine stack is of size *size* and its current configuration is described by the mapping *sa* which maps stack positions to abstract values, ˆ*aw* represents the number of active words in memory, and ˆ*gas* is the remaining gas. Similarly, the fact Mem*pp* ( ˆ*pos*, v, ˆ *cd*) states that at program point *pp* and relative call depth *cd* at memory address *pos*ˆ there is the (abstract) value ˆv. The values on the stack and in local memory range over an abstract domain. Concretely, we define the abstract domain Dˆ to be the set {⊥, , a<sup>∗</sup>} ∪ <sup>N</sup> which constitutes a bounded lattice (D, <sup>ˆ</sup> ,,, , <sup>⊥</sup>) satisfying <sup>⊥</sup> <sup>a</sup><sup>∗</sup> and <sup>⊥</sup> <sup>n</sup> for all <sup>n</sup> <sup>∈</sup> <sup>N</sup>. Intuitively, in our analysis will represent unknown (symbolic) values and <sup>a</sup><sup>∗</sup> will represent the unknown (symbolic) address of contract c∗.

Treating the address of the contract under analysis in a symbolic fashion is crucial for obtaining a meaningful analysis, as the address of this account on the blockchain can not easily be assumed to be known upfront. Although discussing this peculiarity is beyond the scope of this paper, a broader presentation of the symbolic address paradigm can be found in the technical report [1].

For performing operations and comparisons on values from the abstract domain, we will assume versions of the unary, binary and comparison operators on the values from <sup>D</sup>ˆ. We will mark abstract operators with a hat (-·) and e.g., write + for abstract addition or - = for abstract equality. The operators will - treat and <sup>a</sup><sup>∗</sup> as arbitrary values so that e.g., <sup>+</sup> <sup>n</sup> evaluates to and <sup>=</sup> n evaluates to *true* and *false* for all <sup>n</sup> <sup>∈</sup> <sup>N</sup>.

Formally, we establish the relation between a concrete machine state μ and its abstraction by an abstraction function that translates machine states to a set of closed analysis facts. Figure 3 shows the abstraction function α*<sup>µ</sup>* that maps a local machine state into an abstract state consisting of a set of analysis facts. The abstraction is defined with respect to the relative call depth *cd* of the execution and a value abstraction function ˚· that maps concrete values into values from the abstract domain. The function ˚· thereby maps all concrete values to the corresponding (concrete) values in the abstract domain, but those values that can potentially represent the address of contract c∗, hence, they are translated to a<sup>∗</sup> and therefore over-approximated. This treatment might introduce spurious counterexamples with respect to the concrete execution of the real contract on the blockchain (where it is assigned a concrete address). On the one hand, this is due to the fact that by this abstraction the concrete value of the address is assumed to be arbitrary. On the other hand, abstract computations with α always result in and therefore possible constraints on these results are lost. However, the first source of imprecision should not be considered an imprecision per se, as the c∗'s address is not assumed to be known statically, thus, the goal of the abstraction is to over-approximate the executions with all possible addresses.

The translation proceeds by creating a set of instances of the machine state predicates. For creating instances of the MState*pp* predicate, the concrete values *aw* and *gas* are over-approximated by ˚*aw* and ˚*gas* respectively, and the stack is translated to an abstract array representation using the function stackToArray. The instances of the memory predicate are created by translating the memory mapping m to a relational representation with abstract locations and values.<sup>4</sup>

**Table 3.** Abstraction function for the local machine state *µ*

$$\begin{split} \alpha\_{\mu} \left( (gas, pc, m, a\nu, s), cd \right) &:= \{ \mathsf{MState}\_{\langle \vec{w}^{\star}, \rho \vec{s} \rangle} (\mathsf{stackTotAr\mathsf{X}\u{\mathsf{A}\tau}\mathsf{a}\u{\mathsf{A}\tau}s} (s), \vec{a}\vec{w}, g\vec{s}s, cd) \} \\ &\cup \{ \mathsf{Mern}\_{\langle \vec{w}^{\star}, \rho \vec{s} \rangle} (p\lozenge s, \vec{v}, cd) \mid m \left[ pos \right] = v \land pos \le 2^{256} \} \end{split}$$

#### **6.3 Abstract Execution Rules**

As all state predicates are parametrized by their program points, the abstract semantics needs to be formulated with respect to program points as well. More precisely this means that for each program counter of contract c<sup>∗</sup> a set of Horn clauses is created that describes the semantics of the instruction at this program counter. Formally, a function -· {*c*∗} *pp* is defined that creates the required set of rules given that the instruction *inst* is at position *pc* of contract c∗'s code.

<sup>4</sup> The reason for using a separate predicate for representing local memory instead of encoding it as an argument of array type in the main machine state predicate is purely technical: for modeling memory usage correctly we would need a rich set of array operations that are however not supported by the fixedpoint engines of modern SMT solvers.

Table 4 shows a part of the definition (excerpt of the rules) of -· {*c*∗} *pp* for the ADD instruction. The main functionality of the rule is described by the Horn clause 1 that describes how the machine stack and the gas evolve when executing ADD. First the precondition is checked whether the sufficient amount of gas and stack elements are available. Then the two (abstract) top elements ˆx and ˆy are extracted from the stack and their sum is written to the top of the stack while reducing the overall stack size by 1. In addition, the local gas value is reduced by 3 in an abstract fashion. In the memory rule (Horn clause 2), again the preconditions are checked and then (as memory is not affected by the ADD instruction) the memory is propagated. This propagation is needed due to the memory predicate's parametrization with the program counter: For making the memory accessible in the next execution step, its values need to be written into the corresponding predicate for the next program counter. Finally, Horn clauses 3 and 4 characterize the exception cases: an exception while executing the ADD instruction can occur either because of a stack underflow or as the execution runs out of gas. In both cases the exception state is entered which is indicated by recording the relative call depth of the exception in the predicate Exc*id*<sup>∗</sup> (*cd*).

By allowing gas values to come from the abstract domain, we enable symbolic treatment of gas. In particular this means that when starting the analysis with gas value , all gas calculations will directly result in again (and could therefore be omitted) and in particular all checks on the gas will result in *true* and *false* and consequently always both paths (regular execution via Horn clauses 1 and 2 and exception via Horn clause 4) will be triggered in the analysis.

For over-approximating the semantics of call instructions, more involved abstractions are needed. We will illustrate these abstractions in the following in an intuitive way and refer to [1] for the technical details. Note that in the following we will assume CALL instructions to be the only kind of transaction initiating instructions that are contained in the contracts that we consider for analysis. A generalization of the analysis that allows for incorporating also other call types is presented in [1].

As we are considering c<sup>∗</sup> the only contract to be known, whenever a call is performed that is not a self-call, we need to assume that an arbitrary contract c? gets executed. The general idea for over-approximating calls to an unknown contract c? is that only those execution states that represent executions of contract c<sup>∗</sup> will be over-approximated. Consequently, when a call is performed, all possible effects on future executions of c<sup>∗</sup> that might be caused by the execution of c? (including the initiation of further initial transactions that might cause reentering c∗) need to be captured. For doing this as accurate as possible, we use the following observations:


In general, we can soundly capture the possibility of contract c<sup>∗</sup> being reentered during the execution of c? by assuming to reenter c<sup>∗</sup> at every higher call

#### **Table 4.** Excerpt of the abstract rules for ADD

$$\begin{split} \mathsf{\{\mathsf{ADD}\}}\_{\left(\mathsf{id}\ \mathsf{s},\mathsf{c}\right)}^{\{\mathsf{c}\}} &= \{\mathsf{M}\mathsf{State}\_{\left(\mathsf{d}\ \mathsf{r},\mathsf{c}\right)}(\left(size,sa\right),\mathsf{a}\mathsf{v},\mathsf{g}\mathsf{a}s,\mathsf{c}d\right) \land size > 1 \land \mathsf{g}\mathsf{a}s \ge 3\\ &\quad \land \hat{x} = sa[size-1] \land \hat{y} = sa[size-2] \\ &\quad \Rightarrow \mathsf{M}\mathsf{State}\_{\left(\mathsf{d}\ \mathsf{r},\mathsf{c}\right)}(\left(size-1,sa\right)^{\mathsf{c}\lor\mathsf{r}-2}\_{\hat{x}\to\hat{y}}), \mathsf{a}\mathsf{v}, \mathsf{g}\mathsf{a}s \overset{\sim}{\longrightarrow} 3, \mathsf{c}d\}, \\ \mathsf{M}\mathsf{em}\_{\left(\mathsf{d}\ \mathsf{r},\mathsf{c}\right)}(\left(\mathsf{p}\ \mathsf{s},\mathsf{c}\right), \mathsf{a}\mathsf{d}) \land \mathsf{M}\mathsf{State}\_{\left(\mathsf{d}\ \mathsf{r},\mathsf{c}\right)}(\left(size,sa\right), \mathsf{g}\mathsf{a}s, \mathsf{a}\mathsf{v}, \mathsf{c}d) \end{split} \tag{1}$$

$$1 \land size \ge 1 \land g\lor s \overset{\smile}{\ge} 3 \Rightarrow \mathsf{Mern}\_{\left(\amalg^{\bullet}, \circledast \succ 1\right)}(p\lor s, \forall a, cd), \tag{2}$$

$$\mathsf{MState}\_{\langle \phi^\*, \rho \mathbf{c} \rangle} \left( (size, sa), g\hat{a}s, a\hat{w}, cd \right) \land size < 2 \Rightarrow \mathsf{Exc}\_{id\*} \left( cd \right), \tag{3}$$

$$\mathsf{MState}\_{\left(\hat{u}\mathcal{T},\rho\text{c}\right)}\left(\left(size,sa\right),\hat{g}\hat{as},\hat{a}\hat{w},cd\right)\land\hat{g}\hat{as}\stackrel{\frown}{<}\mathbf{3}\Rightarrow\mathsf{Exc}\_{\left(id\*\right)}\left(cd\right)\dots\right) \quad (4)$$

**Fig. 5.** Illustration of the abstraction of the semantics for the CALL instruction.

level. For keeping the desired precision, we can use the previously made observations for imposing restrictions on the reenterings of c∗: First, we assume the persistent storage of c<sup>∗</sup> to be the same as at the point of calling (observation 1.). Second, we know that execution starts at program counter 0 in a fresh machine state (observation 2.). This allows us to initialize the machine state predicates presented in Table 2 accordingly at program counter zero. All other parts of the global state and the execution environment need to be considered unknown at the point of reentering as they might have potentially been changed during the execution of c?. This in particular also applies to the balance of contract c∗.

Figure 5 illustrates how the abstract configurations over-approximating the concrete execution states of c<sup>∗</sup> evolve within the execution of the abstract semantics. We write Π S for denoting that an abstract configuration Π (here graphically depicted in gray frames) is an over-approximation of call stack S. The depicted execution starts in the initial execution state s*<sup>c</sup>*<sup>∗</sup> of c∗. This is state is over-approximated by assuming the storage and balance of c<sup>∗</sup> as well as all other information on the global state to be unknown and therefore initialized to in the corresponding state predicates of the abstract configuration (denoted in the picture by marking the corresponding state components in red). The execution steps representing the executions of local instructions are mimicked step-wise by corresponding abstract execution steps. During these steps a more refined knowledge about the state of c<sup>∗</sup> and its environment might be gained (e.g., the value of some storage cells where information is written, or some restrictions on the account's balances, marked in green or blue, respectively). When finally a CALL instruction is executed, every potential reentering of contract c<sup>∗</sup> (here exemplified by execution state t*<sup>c</sup>*<sup>∗</sup> ) is over-approximated by abstract configurations for every call depths *cd* > 0 that consider all global state and environmental information to be arbitrary, but the parts modeling the persistent storage of c<sup>∗</sup> to be as at the point of calling. In Sect. 7 we will show how this abstraction will help us to automatically check smart contracts for single-entrancy in a sound and precise manner. In addition to these over-approximations that capture the effects on c<sup>∗</sup> during the execution of an unknown contract, for over-approximating CALL instructions some other abstractions need to be performed that model the semantics of returning:


For a complete account and formal description of the abstractions, we refer to the full specification of the abstract semantics spelled out in the technical report [1].

#### **7 Verifying Security Properties**

In this section, we will show how the previously presented analysis can be used for proving reachability properties of Ethereum smart contracts in an automated fashion.

To this end, we review EtherTrust [1], the first sound static analyzer for EVM bytecode. EtherTrust proceeds by translating contract code provided in the bytecode format into an internal Horn clause representation. This Horn clause representation, together with facts over-approximating all potential initial configurations are handed to the SMT solver Z3 [45] via an API. For showing that the analyzed contract satisfies a reachability property, the unsatisfiability of the corresponding analysis queries needs to be verified using Z3's fixedpoint engine SPACER [46]. If all analysis queries are deemed unsatisfiable then the contract under analysis is guaranteed to satisfy the original reachability query due to the soundness of the underlying analysis.

In the following we will discuss the analysis queries used for verifying singleentrancy and illustrate how these queries allow for capturing contracts that are vulnerable to reentrancy such as the example presented in Sect. 5.

#### **7.1 Over-Approximating Single-Entrancy**

For being able to automatically check for single-entrancy, we need to simplify the original property in order to obtain a description that is expressible in terms of the analysis framework described in Sect. 6. To this end, a strictly stronger property named *call unreachability* is presented that is proven to imply singleentrancy:

**Definition 2 (Call unreachability** [1]**).** *A contract* c *is call unreachable if for all initial execution states* (μ, ι, σ) *such that* (μ, ι, σ)*<sup>c</sup> is well formed, it holds that for all transaction environments* Γ *and all call stacks* S

$$\begin{aligned} \neg \exists s, S'. \, F \models (\mu, \iota, \sigma)\_c :: S \to^\* s\_c :: S' ++ S\\ \land \, |S'| > 0 \, \wedge \, \, code \, (c) \, [s.\mu. \mathfrak{pc}] \in \mathit{Inst}\_{call} \end{aligned}$$

*With Instcall* <sup>=</sup> {*CALL*, *CALLCODE*, *DELEGATECALL*, *CREATE*}

Intuitively, this property states that it should not be possible to reach a call instruction of c<sup>∗</sup> after reentering. As we are excluding all transaction initiating instructions but CALL from the analysis, it is sufficient to query for the reachability of a CALL instruction of c<sup>∗</sup> on a higher call depth. More precisely, we end up with the following set of queries:

$$\{\mathsf{MState}\_{\{\mathsf{id},\mathsf{pc}\}}((size,sa),aw,gas,cd)\land cd>0 \mid code\,(c^\*)\,[pc]=\mathsf{CAL}\}\tag{5}$$

As the MState*pp* predicate tracks the state of the machine state at all program points, it can be used as indicator for reachability of the program point as such. Consequently, by querying the MState(*id*\*, pc) for all program counters *pc* where c<sup>∗</sup> has a CALL instruction and along with that requiring a call depth exceeding zero, we can check whether a call instruction is reachable in some reentering execution.

#### **7.2 Examples**

We will use examples for showing how the analysis detects, and proves the absence of reentrancy bugs, respectively. To this end, we revisit the contract Bob presented in Sect. 5, and introduce a contract Alice that fixes the reentrancy bug that is present in Bob. The two contracts are shown in Figure 6.

**Detecting Reentrancy Bugs.** We illustrate how the analysis detects reentrancy bugs using the example in Figure 6a. To this end we give a graphical description of the over-approximations performed when analyzing contract Bob which is depicted in Figure 7. For the sake of presentation, we give the contract code in Solidity instead of bytecode and argue about it on this level even though the analysis is carried out on bytecode level.

As discussed in Sect. 6.3, the analysis considers the execution of contract Bob to start in an unknown environment, which implies that also the value of the Foundations and Tools for the Static Analysis of Ethereum Smart Contracts 73


**Fig. 6.** Examples for contracts showing and being robust against the reentrancy bug.

contract's sent variable is unknown and hence initialized to . As a consequence, the equality check in line 4 is considered to evaluate to both *true* and *false* in the abstract setting (as needs to be considered to potentially equal every concrete value). Accordingly, the analysis needs to consider the then-branch of the conditional and consequently the call in line 4. This call is over-approximated as discussed in Sect. 6.3, and therefore considers reentering contract Bob in an arbitrary call depth. In this situation, the sent variable is still over-approximated to have value wherefore the call at line 4 can be reached again which satisfies the reachability query in Eq. 5.

**Proving Single-Entrancy.** We consider the contract Alice shown in Figure 6b. In contrast to contract Bob, this contract does not have the reentrancy vulnerability, as the guard sent that should prevent the call instruction in line 5 from being executed more than once is set before performing the call. As a consequence, when reentering the contract, the guard is already set and stops any further calls. We show that the analysis presented in Sect. 6 is precise enough for proving this contract to be single-entrant. Intuitively, the abstraction is precise as it considers that the contract's persistent storage can be assumed to be unchanged at the point of reentering. Consequently, the then-branch of the conditional can be excluded from the analysis when reentering and the contract can be proven to be single-entrant. A graphic description of this argument is provided in Figure 8. As for contract Bob, the analysis starts in an abstract configuration that assigns the sent variable value , which forces the analysis to consider the then as well as the else-branch of the conditional in line 4. When taking the else-branch, the contract execution terminates without reaching a state satisfying the reachability query. Therefore, it is sufficient to only consider the then-branch for proving the impossibility of re-reaching the call instruction. When executing the call in the then-branch, according to the abstract call semantics, the analysis needs to take all abstract configurations representing executions of Alice at higher call depths into account. However, in each of these abstract configurations it can be assumed that the state of the persistent storage (including the sent variable, highlighted in green) is the same as at the point of calling. As at this point sent was already initialized to the concrete value **true**, the then-branch of the conditional can be excluded from the analysis at any call depth *cd* > 0 and consequently the unreachability of the query in Eq. 5 is proven.

**Fig. 7.** Illustration of the attack detection in contract Bob by the static analysis.

#### **7.3 Discussion**

In this section, we illustrated how the static analysis underlying EtherTrust [1] in principle is capable not only of detecting re-entrancy bugs, but also of proving smart contracts single-entrant. In practice, EtherTrust manages to analyze real-world contracts from the blockchain within several seconds, as detailed in the experimental evaluation presented in [1]. Even though EtherTrust produces false positives due to the performed over-approximations, it still shows better precision on a benchmark than the state-of-the art bug-finding tool Oyente [16] – despite being sound. Similar results are shown when using EtherTrust for checking a simple value independency property.

In general, EtherTrust could be easily extended to support more properties on contract execution – given that those properties or over-approximations of them are expressible as reachability or simple value independency properties. By contrast, checking more involved hyper properties, or properties that span more than one execution of the external transaction execution is currently out of the scope for EtherTrust.

**Fig. 8.** Illustration of proving single-entrancy of contract Alice by the static analysis.

#### **8 Conclusion**

We presented a systematization of the state-of-the-art in Ethereum smart contract verification and outlined the open challenges in this field. Also we discussed in detail the foundations of EtherTrust [1], the first sound static analyzer for EVM bytecode. In particular, we reviewed how the small-step semantics presented in [15] is abstracted into a set of Horn clauses. Also we presented how single-entrancy – a relevant smart contract security property – is expressed in terms of queries, which can be then automatically solved leveraging the power of an SMT solver.

**Acknowledgments.** This work has been partially supported by the European Research Council (ERC) under the European Union's Horizon 2020 research (grant agreement No 771527-BROWSEC), by Netidee through the project EtherTrust (grant agreement 2158), by the Austrian Research Promotion Agency through the Bridge-1 project PR4DLT (grant agreement 13808694) and COMET K1 SBA.

#### **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

### **Layered Concurrent Programs**

Bernhard Kragl1(B) and Shaz Qadeer<sup>2</sup>

<sup>1</sup> IST Austria, Klosterneuburg, Austria bkragl@ist.ac.at <sup>2</sup> Microsoft Research, Redmond, USA

**Abstract.** We present layered concurrent programs, a compact and expressive notation for specifying refinement proofs of concurrent programs. A layered concurrent program specifies a sequence of connected concurrent programs, from most concrete to most abstract, such that common parts of different programs are written exactly once. These programs are expressed in the ordinary syntax of imperative concurrent programs using gated atomic actions, sequencing, choice, and (recursive) procedure calls. Each concurrent program is automatically extracted from the layered program. We reduce refinement to the safety of a sequence of concurrent checker programs, one each to justify the connection between every two consecutive concurrent programs. These checker programs are also automatically extracted from the layered program. Layered concurrent programs have been implemented in the Civl verifier which has been successfully used for the verification of several complex concurrent programs.

#### **1 Introduction**

Refinement is an approach to program correctness in which a program is expressed at multiple levels of abstraction. For example, we could have a sequence of programs P1,...,Ph,Ph+1 where P<sup>1</sup> is the most concrete and the Ph+1 is the most abstract. Program P<sup>1</sup> can be compiled and executed efficiently, Ph+1 is obviously correct, and the correctness of P<sup>i</sup> is guaranteed by the correctness of Pi+1 for all i ∈ [1, h]. These three properties together ensure that P<sup>1</sup> is both efficient and correct. To use the refinement approach, the programmer must come up with each version P<sup>i</sup> of the program and a proof that the correctness of Pi+1 implies the correctness of Pi. This proof typically establishes a connection from every behavior of P<sup>i</sup> to some behavior of Pi+1.

Refinement is an attractive approach to the verified construction of complex programs for a number of reasons. First, instead of constructing a single monolithic proof of P1, the programmer constructs a collection of localized proofs establishing the connection between P<sup>i</sup> and Pi+1 for each i ∈ [1, h]. Each localized proof is considerably simpler than the overall proof because it only needs to reason about the (relatively small) difference between adjacent programs. Second, different localized proofs can be performed using different reasoning methods, e.g., interactive deduction, automated testing, or even informal reasoning.

c The Author(s) 2018

H. Chockler and G. Weissenbacher (Eds.): CAV 2018, LNCS 10981, pp. 79–102, 2018. https://doi.org/10.1007/978-3-319-96145-3\_5

**Fig. 1.** Concurrent programs <sup>P</sup>*i* and connecting checker programs <sup>C</sup>*i* represented by a layered concurrent program LP.

Finally, refinement naturally supports a bidirectional approach to correctness bottom-up verification of a concrete program via successive abstraction or topdown derivation from an abstract program via successive concretization.

This paper explores the use of refinement to reason about concurrent programs. Most refinement-oriented approaches model a concurrent program as a flat transition system, a representation that is useful for abstract programs but becomes increasingly cumbersome for a concrete implementation. To realize the goal of verified construction of efficient and implementable concurrent programs, we must be able to uniformly and compactly represent both highly-detailed and highly-abstract concurrent programs. This paper introduces layered concurrent programs as such a representation.

A layered concurrent program LP represents a sequence P1,...,Ph,Ph+1 of concurrent programs such that common parts of different programs are written exactly once. These programs are expressed not as flat transition systems but in the ordinary syntax of imperative concurrent programs using gated atomic actions [4], sequencing, choice, and (recursive) procedure calls. Our programming language is accompanied by a type system that allows each P<sup>i</sup> to be automatically extracted from LP. Finally, refinement between P<sup>i</sup> and Pi+1 is encoded as the safety of a checker program C<sup>i</sup> which is also automatically extracted from LP. Thus, the verification of P<sup>1</sup> is split into the verification of h concurrent checker programs C1,..., C<sup>h</sup> such that C<sup>i</sup> connects P<sup>i</sup> and Pi+1 (Fig. 1).

We highlight two crucial aspects of our approach. First, while the programs P<sup>i</sup> have an interleaved (i.e., preemptive) semantics, we verify the checker programs C<sup>i</sup> under a cooperative semantics in which preemptions occur only at procedure calls. Our type system [5] based on the theory of right and left movers [10] ensures that the cooperative behaviors of C<sup>i</sup> cover all preemptive behaviors of Pi. Second, establishing the safety of checker programs is not tied to any particular verification technique. Any applicable technique can be used. In particular, different layers can be verified using different techniques, allowing for great flexibility in verification options.

#### **1.1 Related Work**

This paper formalizes, clarifies, and extends the most important aspect of the design of Civl [6], a deductive verifier for layered concurrent programs. Hawblitzel et al. [7] present a partial explanation of Civl by formalizing the connection between two concurrent programs as sound program transformations. In this paper, we provide the first formal account for layered concurrent programs to represent all concurrent programs in a multi-layered refinement proof, thereby establishing a new foundation for the verified construction of concurrent programs.

Civl is the successor to the Qed [4] verifier which combined a type system for mover types with logical reasoning based on verification conditions. Qed enabled the specification of a layered proof but required each layer to be expressed in a separate file leading to code duplication. Layered programs reduce redundant work in a layered proof by enabling each piece of code to be written exactly once. Qed also introduced the idea of abstracting an atomic action to enable attaching a stronger mover type to it. This idea is incorporated naturally in layered programs by allowing a concrete atomic action to be wrapped in a procedure whose specification is a more abstract atomic action with a more precise mover type.

Event-B [1] is a modeling language that supports refinement of systems expressed as interleaved composition of events, each specified as a top-level transition relation. Verification of Event-B specifications is supported by the Rodin [2] toolset which has been used to model and verify several systems of industrial significance. TLA+ [9] also specifies systems as a flat transition system, enables refinement proofs, and is more general because it supports liveness specifications. Our approach to refinement is different from Event-B and TLA+ for several reasons. First, Event-B and TLA+ model different versions of the program as separate flat transition systems whereas our work models them as different layers of a single layered concurrent program, exploiting the standard structuring mechanisms of imperative programs. Second, Event-B and TLA+ connect the concrete program to the abstract program via an explicitly specified refinement mapping. Thus, the guarantee provided by the refinement proof is contingent upon trusting both the abstract program and the refinement mapping. In our approach, once the abstract program is proved to be free of failures, the trusted part of the specification is confined to the gates of atomic actions in the concrete program. Furthermore, the programmer never explicitly specifies a refinement mapping and is only engaged in proving the correctness of checker programs.

The methodology of refinement mappings has been used for compositional verification of hardware designs [11,12]. The focus in this work is to decompose a large refinement proof connecting two versions of a hardware design into a collection of smaller proofs. A variety of techniques including compositional reasoning (converting a large problem to several small problems) and customized abstractions (for converting infinite-state to finite-state problems) are used to create small and finite-state verification problems for a model checker. This work is mostly orthogonal to our contribution of layered programs. Rather, it could be considered an approach to decompose the verification of each (potentially large) checker program encoded by a layered concurrent program.

#### **2 Concurrent Programs**

In this section we introduce a concurrent programming language. The syntax of our programming language is summarized in Fig. 2.

#### **Fig. 2.** Concurrent programs

**Preliminaries.** Let *Val* be a set of *values* containing the Booleans. The set of *variables Var* is partitioned into *global variables GVar* and *local variables LVar* . A *store* σ is a mapping from variables to values, an *expression* e is a mapping from stores to values, and a *transition* t is a binary relation between stores.

**Atomic Actions.** A fundamental notion in our approach is that of an atomic action. An atomic action captures an indivisible operation on the program state together with its precondition, providing a universal representation for both lowlevel machine operations (e.g., reading a variable from memory) and high-level abstractions (e.g., atomic procedure summaries). Most importantly for reasoning purposes, our programming language confines all accesses to global variables to atomic actions. Formally, an *atomic action* is a tuple (I, O, e, t). The semantics of an atomic action in an execution is to first evaluate the expression e, called the *gate*, in the current state. If the gate evaluates to *false* the execution *fails*, otherwise the program state is updated according to the transition t. *Input variables* in I can be read by e and t, and *output variables* in O can be written by t.

*Remark 1.* Atomic actions subsume many standard statements. In particular, (nondeterministic) assignments, assertions, and assumptions. The following table shows some examples for programs over variables x and y.


**Procedures.** A *procedure* is a tuple (I, O, L, s) where I, O, L are the *input*, *output*, and *local variables* of the procedure, and s is a *statement* composed from skip, sequencing, if, and parallel call statements. Since only atomic actions can refer to global variables, the variables accessed in if conditions are restricted to the inputs, outputs, and locals of the enclosing procedure. The meaning of skip, sequencing, and if is as expected and we focus on parallel calls.

**Pcalls.** A *parallel call* (*pcall*, for short) pcall (A, ι, o) (P, ι, o) (A, ι, o) consists of a sequence of invocations of atomic actions and procedures. We refer to the invocations as the *arms* of the pcall. In particular (A, ι, o) is an *atomic-action arm* and (P, ι, o) is a *procedure arm*. An atomic-action arm executes the called atomic action, and a procedure arm creates a child thread that executes the statement of the called procedure. The parent thread is blocked until all arms of the pcall finish. In the standard semantics the order of arms does not matter, but our verification technique will allow us to consider the atomic action arms before and after the procedure arms to execute in the specified order. Parameter passing is expressed using partial mappings ι, o between local variables; ι maps formal inputs of the callee to actual inputs of the caller, and o maps actual outputs of the caller to formal outputs of the callee. Since we do not want to introduce races on local variables, the outputs of all arms must be disjoint and the output of one arm cannot be an input to another arm. Finally, notice that our general notion of a pcall subsumes sequential statements (single atomicaction arm), synchronous procedure calls (single procedure arm), and unbounded thread creation (recursive procedure arm).

**Concurrent Programs.** A *concurrent program* P is a tuple (*gs*, *as*, *ps*, m, I), where *gs* is a finite set of global variables used by the program, *as* is a finite mapping from *action names* A to atomic actions, *ps* is a finite mapping from *procedure names* P to procedures, m is either a procedure or action name that denotes the entry point for program executions, and I is a set of initial stores. For convenience we will liberally use action and procedure names to refer to the corresponding atomic actions and procedures.

**Semantics.** Let P = (*gs*, *as*, *ps*, m, I) be a fixed concurrent program. A *state* consists of a global store assigning values to the global variables and a pool of *threads*, each consisting of a local store assigning values to local variables and a statement that remains to be executed. An *execution* is a sequence of states, where from each state to the next some thread is selected to execute one step. Every step that switches the executing thread is called a *preemption* (also called a context switch). We distinguish between two semantics that differ in (1) preemption points, and (2) the order of executing the arms of a pcall.

In *preemptive semantics*, a preemption is allowed anywhere and the arms of a pcall are arbitrarily interleaved. In *cooperative semantics*, a preemption is allowed only at the call and return of a procedure, and the arms of a pcall are executed as follows. First, the leading atomic-action arms are executed from left to right without preemption, then all procedure arms are executed arbitrarily interleaved, and finally the trailing atomic-action arms are executed, again from left to right without preemption. In other words, a preemption is only allowed when a procedure arm of a pcall creates a new thread and when a thread terminates.

For P we only consider executions that start with a single thread that execute m from a store in I. P is called *safe* if there is no failing execution, i.e., an execution that executes an atomic action whose gate evaluates to *false*. We write *Safe*(P) if P is safe under preemptive semantics, and *CSafe*(P) if P is safe under cooperative semantics.

#### **2.1 Running Example**

In this section, we introduce a sequence of three concurrent programs (Fig. 3) to illustrate features of our concurrent programming language and the layered approach to program correctness. Consider the program <sup>P</sup>*lock* <sup>1</sup> in Fig. 3(a). The program uses a single global Boolean variable b which is accessed by the two atomic actions CAS and RESET. The compare-and-swap action CAS atomically reads the current value of b and either sets b from *false* to *true* and returns *true*, or leaves b *true* and returns *false*. The RESET action sets b to *false* and has a gate (represented as an assertion) that states that the action must only be called when b is *true*. Using these actions, the procedures Enter and Leave implement a spinlock as follows. Enter calls the CAS action and retries (through recursion on itself) until it succeeds to set b from *false* to *true*. Leave just calls the RESET action which sets b back to *false* and thus allows another thread executing Enter to stop spinning. Finally, the procedures Main and Worker serve as a simple client. Main uses a pcall inside a nondeterministic if statement to create an unbounded number of concurrent worker threads, which just acquire the lock by calling Enter and then release the lock again by calling Leave. The call to the empty procedure Alloc is an artifact of our extraction from a layered concurrent program and can be removed as an optimization.

Proving <sup>P</sup>*lock* <sup>1</sup> safe amounts to showing that RESET is never called with b set to *false*, which expresses that <sup>P</sup>*lock* <sup>1</sup> follows a locking discipline of releasing only previously acquired locks. Doing this proof directly on <sup>P</sup>*lock* <sup>1</sup> has two drawbacks. First, the proof must relate the possible values of b with the program counters of all running threads. In general, this approach requires sound introduction of ghost code and results in complicated case distinctions in program invariants. Second, the proof is not reusable across different lock implementations. The correctness of the client does not specifically depend on using a spinlock over a Boolean variable, and thus the proof should not as well. We show how our refinement-based approach addresses both problems.

Program <sup>P</sup>*lock* <sup>2</sup> in Fig. 3(b) is an abstraction of <sup>P</sup>*lock* <sup>1</sup> that introduces an abstract lock specification. The global variable b is replaced by lock which ranges over integer thread identifiers (0 is a dedicated value indicating that the lock is available). The procedures Alloc, Enter and Leave are replaced by the atomic actions ALLOC, ACQUIRE and RELEASE, respectively. ALLOC allocates unique and non-zero thread identifiers using a set of integers slot to store the identifiers not allocated so far. ACQUIRE blocks executions where the lock is not

**Fig. 3.** Lock example

available (assume lock == 0) and sets lock to the identifier of the acquiring thread. RELEASE asserts that the releasing thread holds the lock and sets lock to <sup>0</sup>. Thus, the connection between <sup>P</sup>*lock* <sup>1</sup> and <sup>P</sup>*lock* <sup>2</sup> is given by the invariant b <==> lock != 0 which justifies that Enter refines ACQUIRE and Leave refines RELEASE. The potential safety violation in <sup>P</sup>*lock* <sup>1</sup> by the gate of RESET is preserved in <sup>P</sup>*lock* <sup>2</sup> by the gate of RELEASE. In fact, the safety of <sup>P</sup>*lock* <sup>2</sup> expresses the stronger locking discipline that the lock can only be released by the thread that acquired it.

Reasoning in terms of ACQUIRE and RELEASE instead of Enter and Leave is more general, but it is also simpler! Figure 3(b) declares atomic actions with a *mover type* [5], right for *right mover*, and left for *left mover*. A right mover executed by a thread commutes to the right of any action executed by a different thread. Similarly, a left mover executed by thread commutes to the left of any action executed by a different thread. A sequence of right movers followed by at most one non-mover followed by a sequence of left movers in a thread can be considered atomic [10]. The reason is that any interleaved execution can be rearranged (by commuting atomic actions), such that these actions execute consecutively. For <sup>P</sup>*lock* <sup>2</sup> this means that Worker is atomic and thus the gate of RELEASE can be discharged by pure sequential reasoning; ALLOC guarantees tid != 0 and after executing ACQUIRE we have lock == tid. As a result, we finally obtain that the atomic action SKIP in <sup>P</sup>*lock* <sup>3</sup> (Fig. 3(c)) is a sound abstraction of procedure Main in <sup>P</sup>*lock* <sup>2</sup> . Hence, we showed that program <sup>P</sup>*lock* <sup>1</sup> is safe by soundly abstracting it to <sup>P</sup>*lock* <sup>3</sup> , a program that is trivially safe.

The correctness of right and left annotations on ACQUIRE and RELEASE, respectively, depends on pair-wise commutativity checks among atomic actions in <sup>P</sup>*lock* <sup>2</sup> . These commutativity checks will fail unless we exploit the fact that every thread identifier allocated by Worker using the ALLOC action is unique. For instance, to show that ACQUIRE executed by a thread commutes to the right of RELEASE executed by a different thread, it must be known that the parameters tid to these actions are distinct from each other. The *linear* annotation on the local variables named tid and the global variable slots (which is a set of integers) is used to communicate this information.

The overall invariant encoded by the *linear* annotation is that the set of values stored in slots and in local linear variables of active stack frames across all threads are pairwise disjoint. This invariant is guaranteed by a combination of a linear type system [14] and logical reasoning on the code of all atomic actions. The linear type system ensures using a flow analysis that a value stored in a linear variable in an active stack frame is not copied into another linear variable via an assignment. Each atomic action must ensure that its state update preserves the disjointness invariant for linear variables. For actions ACQUIRE and RELEASE, which do not modify any linear variables, this reasoning is trivial. However, action ALLOC modifies slots and updates the linear output parameter tid. Its correctness depends on the (semantic) fact that the value put into tid is removed from slots; this reasoning can be done using automated theorem provers.

#### **3 Layered Concurrent Programs**

A layered concurrent program represents a sequence of concurrent programs that are connected to each other. That is, the programs derived from a layered concurrent program share syntactic structure, but differ in the granularity of the atomic actions and the set of variables they are expressed over. In a layered concurrent program, we associate layer numbers and layer ranges with variables (both global and local), atomic actions, and procedures. These layer numbers control the introduction and hiding of program variables and the summarization of compound operations into atomic actions, and thus provide the scaffolding of a refinement relation. Concretely, this section shows how the concurrent programs <sup>P</sup>*lock* <sup>1</sup> , <sup>P</sup>*lock* <sup>2</sup> , and <sup>P</sup>*lock* <sup>3</sup> (Fig. 3) and their connections can all be expressed in a single layered concurrent program. In Sect. 4, we discuss how to check refinement between the successive concurrent programs encoded in a layered concurrent program.

**Syntax.** The syntax of layered concurrent programs is summarized in Fig. 4. Let N be the set of non-negative integers and I the set of nonempty *intervals* [a, b].

We refer to integers as *layer numbers* and intervals as *layer ranges*. A *layered concurrent program* LP is a tuple (*GS*, *AS*,*IS*,*PS*, m, I) which, similarly to concurrent programs, consists of global variables, atomic actions, and procedures, with the following differences.


The pcall<sup>α</sup> statement in a layered concurrent program differs from the pcall statement in concurrent programs in two ways. First, it can only have procedure arms. Second, it has a parameter α which is either ε (*unannotated pcall*) or the index of one of its arms (*annotated pcall*). We usually omit writing ε in unannotated pcalls.

5. m is a procedure name.

The *top layer* h of a layered concurrent program is the disappearing layer of m.

**Intuition Behind Layer Numbers.** Recall that a layered concurrent program LP should represent a sequence of h+ 1 concurrent programs P1, ··· ,Ph+1 that are connected by a sequence of h checker programs C1, ··· , C<sup>h</sup> (cf. Fig. 1). Before we provide formal definitions, let us get some intuition on two core mechanisms: global variable introduction and procedure abstraction/refinement.

Let v be a global variable with layer range [a, b]. The meaning of this layer range is that the "first" program that contains v is Ca, the checker program connecting P<sup>a</sup> and Pa+1. In particular, v is not yet part of Pa. In C<sup>a</sup> the introduction actions at layer a can modify v and thus assign its meaning in terms of all other available variables. Then v is part of Pa+1 and all programs up to and including Pb. The "last" program containing v is Cb. In other words, when going from a program P<sup>i</sup> to Pi+1 the variables with upper bound i disappear and the variables with lower bound i are introduced; the checker program C<sup>i</sup> has access to both and establishes their relationship.

Let P be a procedure with disappearing layer n and refined atomic action A. The meaning of the disappearing layer is that P exists in all programs from P<sup>1</sup> up to and including Pn. In Pn+1 and above every invocation of P is replaced by an invocation of A. To ensure that this replacement is sound, the checker program C<sup>n</sup> performs a refinement check that ensures that every execution of P behaves like A. Observe that the body of procedure P itself changes from P<sup>1</sup> to P<sup>n</sup> according to the disappearing layer of the procedures it calls.

With the above intuition in mind it is clear that the layer annotations in a layered concurrent program cannot be arbitrary. For example, if procedure P calls a procedure Q, then Q cannot have a higher disappearing layer than P, for Q could introduce further behaviors into the program after P was replaced by A, and those behaviors are not captured by A.

#### **3.1 Type Checker**

We describe the constraints that need to be satisfied for a layered concurrent program to be well-formed. A full formalization as a type checker with top-level judgment LP is given in Fig. 5. For completeness, the type checker includes standard constraints (e.g., variable scoping, parameter passing, etc.) that we are not going to discuss.

**(Atomic Action)/(Introduction Action).** Global variables can only be accessed by atomic actions and introduction actions. For a global variable v with layer range [a, b], introduction actions with layer number a are allowed to modify v (for sound variable introduction), and atomic actions with a layer range contained in [a + 1, b] have access to v. Introduction actions must be nonblocking, which means that every state that satisfies the gate must have a possible transition to take. This ensures that introduction actions only assign meaning to introduced variables but do not exclude any program behavior.

**(If).** Procedure bodies change from layer to layer because calls to procedures become calls to atomic actions. But the control-flow structure within a procedure is preserved across layers. Therefore (local) variables accessed in an if condition must be available on all layers to ensure that the if statement is well-defined on every layer.

**(Introduction Call).** Let A be an introduction action with layer number n. Since A modifies global variables introduced at layer n, icalls to A are only allowed from procedures with disappearing layer n. Similarly, the formal output parameters of an icall to A must have introduction layer n. The icall is only preserved in Cn.

$$\begin{aligned} \widehat{GS}(v) &= [a+1, b] \text{ for } GS(v) = [a, b] \\\\ReadValues(e) &= \{v \mid \exists \ \sigma, a: e(\sigma) \neq e(\sigma[v \mapsto a])\} \cup \\\\ReadValues(t) &= \{v \mid \exists \ \sigma, \sigma', a: (\sigma, \sigma') \in t \land (\sigma[v \mapsto a], \sigma') \notin t\} \\\\ReadValues(e, t) &= RedValues(e) \cupReadVs(t) \\\\ WriteVars(t) &= \{v \mid \exists \ \sigma, \sigma': (\sigma, \sigma') \in t \land \sigma(v) \neq \sigma'(v)\} \\\\Nonblocking(e, t) &= \forall \ \sigma \in e: \exists \ \sigma': (\sigma, \sigma') \in t \end{aligned}$$

**Fig. 5.** Type checking rules for layered concurrent programs

**(Parallel Call).** All arms in a pcall must be procedure arms invoking a procedure with a disappearing layer less than or equal to the disappearing layer of the caller. Furthermore, above the disappearing layer of the callee its refined atomic action must be available up to the disappearing layer of the caller. Parameter passing can only be well-defined if the actual inputs exist before the formal inputs, and the formal outputs exist before the actual outputs. The sequence of disappearing layers of the procedures in a pcall must be monotonically increasing and then decreasing, such that the resulting pcall in the extracted programs consists of procedure arms surrounded by atomic-action arms on every layer.

Annotated pcalls are only used for invocations to procedures with the same disappearing layer n as the caller. In particular, during refinement checking in C<sup>n</sup> only the arm with index α is allowed to modify the global state, which must be according to the refined atomic action of the caller. The remaining arms must leave the global state unchanged.

#### **3.2 Concurrent Program Extraction**

Let LP = (*GS*, *AS*,*IS*,*PS*, m, I) be a layered concurrent program such that *PS*(m)=( , , , , h, , Am). We show how to extract the programs P1, ··· ,Ph+1 by defining a function Γ(LP) such that P = Γ(LP) for every ∈ [1, h + 1]. For a local variable layer mapping *ns* we define the set of local variables with layer number less then as *ns*| = {v | *ns*(v) < }. Now the extraction function Γ is defined as

$$
\Gamma\_\ell(\mathcal{CP}) = (gs, as, ps, m', \mathcal{T}),
$$

where

$$\begin{aligned} gs &= \{ v \mid GS(v) = [a, b] \land \ell \in [a+1, b] \}, \\ ns &= \{ A \mapsto (I, O, e, t) \mid AS(A) = (I, O, e, t, r) \land \ell \in r \}, \\ ps &= \{ P \mapsto (I \cap ns|\_{\ell}, O \cap ns|\_{\ell}, L \cap ns|\_{\ell}, \Gamma\_{\ell}^{P}(s)) \mid PS(P) = (I, O, L, s, n, ns, \downarrow) \land \ell \le n \}, \\ m' &= \begin{cases} m & \text{if } \ell \in [1, h] \\ A\_{m} & \text{if } \ell = h + 1 \end{cases}, \end{aligned}$$

and the extraction of a statement in the body of procedure P is given by

$$\begin{array}{lcl} \Gamma\_{\ell}^{P}(\mathsf{sk}\texttt{kp}) &= \texttt{skip}, & &\\ \Gamma\_{\ell}^{P}(s\_{1}\,;\,s\_{2}) &= \Gamma\_{\ell}^{P}(s\_{1})\,;\,\Gamma\_{\ell}^{P}(s\_{2}), &\\ \Gamma\_{\ell}^{P}(\mathsf{1f}\,\mathsf{e}\,\mathsf{ then}\,\,s\_{1}\,\mathsf{else}\,\,s\_{2}) = \mathsf{if}\,\,\mathsf{e}\,\,\mathsf{then}\,\,\Gamma\_{\ell}^{P}(s\_{1})\,\mathsf{else}\,\,\,\Gamma\_{\ell}^{P}(s\_{2}), &\\ \Gamma\_{\ell}^{P}(\mathsf{1a11}\,(A,\iota,o)) &= \mathsf{skip}, &\\ \Gamma\_{\ell}^{P}(\mathsf{pcal1}\_{\alpha}\,\overline{(Q,\iota,o)}) &= \mathsf{pcal1}\,\,\overline{(X,\iota|\_{n\_{Q}\mid\iota},o|\_{n\_{P}\mid\iota})},\\ &\qquad\qquad\qquad\qquad\qquad\qquad\qquad\qquad\qquad\qquad\qquad\qquad\qquad\qquad\qquad\qquad\qquad\qquad\qquad\qquad\qquad\qquad\qquad\qquad\qquad\qquad\qquad\qquad\qquad\qquad\qquad\qquad\qquad\qquad\qquad\qquad\qquad\qquad\qquad\qquad\qquad\qquad\qquad\qquad\qquad\qquad\qquad\qquad\qquad\qquad\qquad\qquad\qquad\qquad\qquad\qquad\qquad\qquad\qquad\qquad\qquad\qquad\qquad\qquad\qquad\qquad\qquad\qquad\qquad\qquad\qquad\qquad\qquad\qquad\qquad\qquad\qquad\qquad\qquad\qquad\qquad\qquad\qquad\qquad\qquad\qquad\qquad\qquad\qquad\qquad\qquad\qquad\qquad\qquad\qquad\qquad\qquad\qquad\qquad\qquad\qquad\qquad\qquad\qquad\qquad\qquad\qquad\qquad\qquad\qquad\qquad\qquad\qquad\qquad\qquad\qquad\qquad\qquad\qquad\qquad\qquad\qquad\qquad\qquad\qquad\qquad\qquad\qquad\qquad\qquad\qquad\qquad\qquad\qquad\qquad\qquad\qquad\qquad\qquad\qquad\qquad\qquad\qquad\qquad\qquad\qquad\qquad\qquad\qquad\qquad\qquad\qquad\qquad\qquad\qquad\qquad\qquad\qquad\qquad\qquad\qquad\qquad\qquad\qquad\qquad\qquad\qquad\qquad\qquad\qquad\qquad\qquad\qquad\qquad\qquad\qquad\$$

Thus P includes the global and local variables that were introduced before and the atomic actions with in their layer range. Furthermore, it does not contain introduction actions and correspondingly all icall statements are removed. Every arm of a pcall statement, depending on the disappearing layer n of the called procedure Q, either remains a procedure arm to Q, or is replaced by an atomicaction arm to A, the atomic action refined by Q. The input and output mappings are restricted to the local variables at layer . The set of initial stores of P is the same as for LP, since stores range over all program variables.

In our programming language, loops are subsumed by the more general mechanism of recursive procedure calls. Observe that P can indeed have recursive procedure calls, because our type checking rules (Fig. 5) allow a pcall to invoke a procedure with the same disappearing layer as the caller.

#### **3.3 Running Example**

We return to our lock example from Sect. 2.1. Figure 6 shows its implementation as the layered concurrent program LP*lock* . Layer annotations are indicated using an @ symbol. For example, the global variable b has layer range [0, 1], all occurrences of local variable tid have introduction layer 1, the atomic action ACQUIRE has layer range [2, 2], and the introduction action iSetLock has layer number 1.

First, observe that LP*lock* is well-formed, i.e., LP*lock* . Then it is an easy exercise to verify that <sup>Γ</sup>(LP*lock* ) = <sup>P</sup>*lock* for ∈ [1, 3]. Let us focus on procedure Worker. In <sup>P</sup>*lock* <sup>1</sup> (Fig. 3(a)) tid does not exist, and correspondingly Alloc, Enter, and Leave do not have input respectively output parameters. Furthermore, the icall in the body of Alloc is replaced with skip. In <sup>P</sup>*lock* <sup>2</sup> (Fig. 3(b)) we have tid and the calls to Alloc, Enter, and Leave are replaced with their respective refined atomic actions ALLOC, ACQUIRE, and RELEASE. The only annotated pcall in LP*lock* is the recursive call to Enter.

In addition to representing the concurrent programs in Fig. 3, the program LP*lock* also encodes the connection between them via introduction actions and calls. The introduction action iSetLock updates lock to maintain the relationship between lock and b, expressed by the predicate InvLock. It is called in Enter in case the CAS operation successfully set b to *true*, and in Leave when b is set to *false*. The introduction action iIncr implements linear thread identifiers using the integer variables pos which points to the next value that can be allocated. For every allocation, the current value of pos is returned as the new thread identifier and pos is incremented.

The variable slots is introduced at layer 1 to represent the set of unallocated identifiers. It contains all integers no less than pos, an invariant that is expressed by the predicate InvAlloc and maintained by the code of iIncr. The purpose of slots is to encode linear allocation of thread identifiers in a way that the body of iIncr can be locally shown to preserve the disjointness invariant for linear variables; slots plays a similar role in the specification of the atomic action ALLOC in P2. The variable pos is both introduced and hidden at layer 1 so that it exists neither in <sup>P</sup>*lock* <sup>1</sup> nor <sup>P</sup>*lock* <sup>2</sup> . However, pos is present in the checker program <sup>C</sup><sup>1</sup> that connects <sup>P</sup>*lock* <sup>1</sup> and <sup>P</sup>*lock* <sup>2</sup> .

**Fig. 6.** Lock example (layered concurrent program)

The bodies of procedures Cas and Reset are not shown in Fig. 6 because they are not needed. They disappear at layer 0 and are replaced by the atomic actions CAS and RESET, respectively, in <sup>P</sup>*lock* <sup>1</sup> .

The degree of compactness afforded by layered programs (as in Fig. 6) over separate specification of each concurrent program (as in Fig. 3) increases rapidly with the size of the program and the maximum depth of procedure calls. In our experience, for realistic programs such as a concurrent garbage collector [7] or a data-race detector [15], the saving in code duplication is significant.

#### **4 Refinement Checking**

Section 3 described how a layered concurrent program LP encodes a sequence P1,...,Ph,Ph+1 of concurrent programs. In this section, we show how the safety of any concurrent program in the sequence is implied by the safety of its successor, ultimately allowing the safety of P<sup>1</sup> to be established by the safety of Ph+1.

There are three ingredients to connecting P to P+1 for any ∈ [1, h] reduction, projection, and abstraction. Reduction allows us to conclude the safety of a concurrent program under preemptive semantics by proving safety only under cooperative semantics.

**Theorem 1 (Reduction).** *Let* P *be a concurrent program. If MSafe*(P) *and CSafe*(P)*, then Safe*(P)*.*

The judgment *MSafe*(P) uses logical commutativity reasoning and mover types to ensure that cooperative safety is sufficient for preemptive safety (Sect. 4.1). We use this theorem to justify reasoning about *CSafe*(P) rather than *Safe*(P).

The next step in connecting P to P+1 is to introduce computation introduced at layer into the cooperative semantics of P. This computation comprises global and local variables together with introduction actions and calls to them. We refer to the resulting program at layer as <sup>P</sup>-. *layer* <sup>h</sup> *and* <sup>∈</sup> [1, h]*. If CSafe*(P-

**Theorem 2 (Projection).** *Let* LP *be a layered concurrent program with top* )*, then CSafe*(P)*.* Since introduction actions are nonblocking and <sup>P</sup>-

 is safe under cooperative semantics, every cooperative execution of P can be obtained by projecting away the computation introduced at layer . This observation allows us to conclude that every cooperative execution of P is also safe. Finally, we check that the safety of the cooperative semantics of <sup>P</sup>-

 is ensured by the safety of the preemptive semantics of the next concurrent program P+1. This connection is established by reasoning about the cooperative semantics of a concurrent checker program C that is automatically constructed from LP. *layer* <sup>h</sup> *and* <sup>∈</sup> [1, h]*. If CSafe*(C) *and Safe*(P+1)*, then CSafe*(P-

**Theorem 3 (Abstraction).** *Let* LP *be a layered concurrent program with top* )*.* The checker program <sup>C</sup> is obtained by instrumenting the code of <sup>P</sup>-

 with extra variables and procedures that enable checking that procedures disappearing at layer refine their atomic action specifications (Sect. 4.2).

Our refinement check between two consecutive layers is summarized by the following corollary of Theorems 1–3.

**Corollary 1.** *Let* LP *be a layered concurrent program with top layer* h *and* ∈ [1, h]*. If MSafe*(P)*, CSafe*(C) *and Safe*(P+1)*, then Safe*(P)*.*

The soundness of our refinement checking methodology for layered concurrent programs is obtained by repeated application of Corollary 1.

**Corollary 2.** *Let* LP *be a layered concurrent program with top layer* h*. If MSafe*(P) *and CSafe*(C) *for all* ∈ [1, h] *and Safe*(Ph+1)*, then Safe*(P1)*.*

#### **4.1 From Preemptive to Cooperative Semantics**

We present the judgment *MSafe*(P) that allows us to reason about a concurrent program P under cooperative semantics instead of preemptive semantics. Intuitively, we want to use the commutativity of individual atomic actions to rearrange the steps of any execution under preemptive semantics in such a way that it corresponds to an execution under cooperative semantics. We consider mappings M ∈ *Action* → {N, R, L, B} that assign mover types to atomic actions; N for non-mover, R for right mover, L for left mover, and B for both mover. The judgment *MSafe*(P) requires a mapping M that satisfies two conditions.

First, the atomic actions in P must satisfy the following logical commutativity conditions [7], which can be discharged by a theorem prover.


Second, the sequence of atomic actions in preemptive executions of P must be such that the desired rearrangement into cooperative executions is possible.

Given a preemptive execution, consider, for each thread individually, a labeling of execution steps where atomic action steps are labeled with their mover type and procedure calls and returns are labeled with Y (for yield). The nondeterministic *atomicity automaton* A on the right defines all allowed sequences. Intuitively, when we map the

execution steps of a thread to a run in the automaton, the state RM denotes that we are in the right mover phase in which we can stay until the occurrence of a non-right mover (L or N). Then we can stay in the left mover phase (state LM) by executing left movers, until a preemption point (Y) takes us back to RM. Let E be the mapping from edge labels to the set of edges that contain the label, e.g., E(R) = {RM → RM, RM → LM}. Thus we have a representation of mover types as sets of edges in A, and we define E(A) = E(M(A)). Notice that the set representation is closed under relation composition ◦ and intersection, and behaves as expected, e.g., E(R) ◦ E(L) = E(N). 

Now we define an intraprocedural control flow analysis that lifts E to a mapping E on statements. Intuitively, x → y ∈ E (s) means that every execution of the statement s has a run in A from x to y. Our analysis does not have to be interprocedural, since procedure calls and returns are labeled with Y, allowing every possible state transition in A. *MSafe*(P) requires E (s) <sup>=</sup> <sup>∅</sup> for every procedure body s in P, where E is defined as follows:

$$
\begin{split}
\widehat{\mathcal{E}}(\mathsf{sktip}) &= \mathcal{E}(\mathsf{B}) \quad \widehat{\mathcal{E}}(s\_1 \; ; \; s\_2) = \widehat{\mathcal{E}}(s\_1) \circ \widehat{\mathcal{E}}(s\_2) \quad \widehat{\mathcal{E}}(\mathsf{1f} \; \mathsf{e} \; \mathsf{then} \; s\_1 \; \mathsf{else} \; s\_2) = \widehat{\mathcal{E}}(s\_1) \cap \widehat{\mathcal{E}}(s\_2) \\
&\widehat{\mathcal{E}}(\mathsf{pcal11} \; \overline{A\_1} \; \overline{P} \; \overline{A\_2}) = \begin{cases}
\mathcal{E}^\*(\overline{A\_1} \overline{A\_2}) & \text{if } \overline{P} = \varepsilon \\
\mathcal{E}(\mathsf{L}) \odot \mathcal{E}^\*(\overline{A\_1}) \odot \mathcal{E}(\mathsf{Y}) \odot \mathcal{E}^\*(\overline{A\_2}) \odot \mathcal{E}(\mathsf{R}) \text{ if } \overline{P} \neq \varepsilon
\end{cases}
\end{split}
$$

Skip is a both mover, sequencing composes edges, and if takes the edges possible in both branches. In the arms of a pcall we omit writing the input and output maps because they are irrelevant to the analysis. Let us first focus on the case P = ε with no procedure arms. In the preemptive semantics all arms are arbitrarily interleaved and correspondingly we define the function <sup>E</sup><sup>∗</sup>(A<sup>1</sup> ··· <sup>A</sup>n) =

$$\mathcal{E}^\*(A\_1 \cdots A\_n) = \bigcap\_{\tau \in S\_n} \mathcal{E}(A\_{\tau(1)}) \diamond \cdots \diamond \mathcal{E}(A\_{\tau(n)})$$

to consider all possible permutations (τ ranges over the symmetric group Sn) and take the edges possible in all permutations. Observe that E<sup>∗</sup> evaluates to non-empty in exactly four cases: E(N) for {B}<sup>∗</sup>N{B}<sup>∗</sup>, E(B) for {B}<sup>∗</sup>, E(R) for {R, B}<sup>∗</sup> \ {B}<sup>∗</sup>, and E(L) for {L, B}<sup>∗</sup> \ {B}<sup>∗</sup>. These are the mover-type sequences for which an arbitrary permutation (coming from a preemptive execution) can be rearranged to the order given by the pcall (corresponding to cooperative execution).

In the case P = ε there is a preemption point under cooperative semantics between A<sup>1</sup> and A2, the actions in A<sup>1</sup> are executed in order before the preemption, and the actions in A<sup>2</sup> are executed in order after the preemption. To ensure that the cooperative execution can simulate an arbitrarily interleaved preemptive execution of the pcall, we must be able to move actions in A<sup>1</sup> to the left and actions in A<sup>2</sup> to the right of the preemption point. We enforce this condition by requiring that A<sup>1</sup> is all left (or both) movers and A<sup>2</sup> all right (or both) movers, expressed by the leading E(L) and trailing E(R) in the edge composition.

#### **4.2 Refinement Checker Programs**

In this section, we describe the construction of checker programs that justify the formal connection between successive concurrent programs in a layered concurrent program. The description is done by example. In particular, we show the checker program <sup>C</sup>*lock* <sup>1</sup> that establishes the connection between <sup>P</sup>*lock* <sup>1</sup> and <sup>P</sup>*lock* 2 (Fig. 3) of our running example.

**Overview.** Cooperative semantics splits any execution of <sup>P</sup>*lock* <sup>1</sup> into a sequence of preemption-free execution fragments separated by preemptions. Verification of <sup>C</sup>*lock* <sup>1</sup> must ensure that for all such executions, the set of procedures that disappear at layer 1 behave like their atomic action specifications. That is, the procedures Enter and Leave must behave like their specifications ACQUIRE and RELEASE, respectively. It is important to note that this goal of checking refinement is easier than verifying that <sup>P</sup>*lock* <sup>1</sup> is safe. Refinement checking may succeed even though <sup>P</sup>*lock* <sup>1</sup> fails; the guarantee of refinement is that such a failure can be simulated by a failure in <sup>P</sup>*lock* <sup>2</sup> . The construction of <sup>C</sup>*lock* <sup>1</sup> can be understood in two steps. First, the program <sup>P</sup>*lock* <sup>1</sup> shown in Fig. <sup>7</sup> extends <sup>P</sup>*lock* <sup>1</sup> (Fig. 3(a)) with the variables introduced at layer 1 (globals lock, pos, slots and locals tid) and the corresponding introduction actions (iIncr and iSetLock). Second, <sup>C</sup>*lock* <sup>1</sup> is obtained from <sup>P</sup>*lock* <sup>1</sup> by instrumenting the procedures to encode the refinement check, described in the remainder of this section.

**Fig. 7.** Lock example (variable introduction at layer 1)

**Context for Refinement.** There are two kinds of procedures, those that continue to exist at layer 2 (such as Main and Worker) and those that disappear at layer 1 (such as Enter and Leave). <sup>C</sup>*lock* <sup>1</sup> does not need to verify anything about the first kind. These procedures only provide the context for refinement checking and thus all invocation of an atomic action (I, O, e, t) in any atomic-action arm of a pcall is converted into the invocation of a fresh atomic action (I, O,*true*, e∧t). In other words, the assertions in procedures that continue to exist at layer 2 are converted into assumptions for the refinement checking at layer 1; these assertions are verified during the refinement checking on a higher layer. In our example, Main and Worker do not have atomic-action arms, although this is possible in general.

**Refinement Instrumentation.** We illustrate the instrumentation of procedures Enter and Leave in Fig. 8. The core idea is to track updates by preemptionfree execution fragments to the shared variables that continue to exist at layer 2. There are two such variables—lock and slots. We capture snapshots of lock and slots in the local variables \_lock and \_slots and use these snapshots to check that the updates to lock and slots behave according to the refined atomic action. In general, any path from the start to the end of the body of a

**Fig. 8.** Instrumented procedures Enter and Leave (layer 1 checker program)

procedure may comprise many preemption-free execution fragments. The checker program must ensure that exactly one of these fragments behaves like the specified atomic action; all other fragments must leave lock and slot unchanged. To track whether the atomic action has already happened, we use two local Boolean variables—pc and done. Both variables are initialized to *false*, get updated to *true* during the execution, and remain at *true* thereafter. The variable pc is set to *true* at the end of the first preemption-free execution fragment that modifies the tracked state, which is expressed by the macro \*CHANGED\* on line 1. The variable done is set to *true* at the end of the first preemption-free execution fragment that behaves like the refined atomic action. For that, the macros \*RELEASE\* and \*ACQUIRE\* on lines 2 and 3 express the transition relations of RELEASE and ACQUIRE, respectively. Observe that we have the invariant pc ==> done. The reason we need both pc and done is to handle the case where the refined atomic action may stutter (i.e., leave the state unchanged).

**Instrumenting** Leave**.** We first look at the instrumentation of Leave. Line 8 initializes the snapshot variables. Recall that a preemption inside the code of a procedure is introduced only at a pcall containing a procedure arm. Consequently, the body of Leave is preemption-free and we need to check refinement across a single execution fragment. This checking is done by lines 14–16. The assertion on line 14 checks that if any tracked variable has changed since the last snapshot, (1) such a change happens for the first time (!pc), and (2) the current value is related to the snapshot value according to the specification of RELEASE. Line 15 updates pc to track whether any change to the tracked variables has happened so far. Line 16 updates done to track whether RELEASE has happened so far. The assertion at line 18 checks that RELEASE has indeed happened before Leave returns. The assumption at line 9 blocks those executions which can be simulated by the failure of RELEASE. It achieves this effect by assuming the gate of RELEASE in states where pc is still *false* (i.e., RELEASE has not yet happened). The assumption yields the constraint lock != 0 which together with the invariant InvLock (Fig. 6) proves that the gate of RESET does not fail.

The verification of Leave illustrates an important principle of our approach to refinement. The gates of atomic actions invoked by a procedure P disappearing at layer are verified using a combination of invariants established on C and pending assertions at layer + 1 encoded as the gate of the atomic action refined by P. For Leave specifically, assert b in RESET is propagated to assert tid != nil && lock == tid in RELEASE. The latter assertion is verified in the checker program <sup>C</sup>*lock* <sup>2</sup> when Worker, the caller of RELEASE, is shown to refine the action SKIP which is guaranteed not to fail since its gate is *true*.

**Instrumenting** Enter**.** The most sophisticated feature in a concurrent program is a pcall. The instrumentation of Leave explains the instrumentation of the simplest kind of pcall with only atomic-action arms. We now illustrate the instrumentation of a pcall containing a procedure arm using the procedure Enter which refines the atomic action ACQUIRE and contains a pcall to Enter itself. The instrumentation of this pcall is contained in lines 30–43.

A pcall with a procedure arm is challenging for two reasons. First, the callee disappears at the same layer as the caller so the checker program must reason about refinement for both the caller and the callee. This challenge is addressed by the code in lines 34–40. At line 34, we introduce a nondeterministic choice between two code paths—then branch to check refinement of the caller and else branch to check refinement of the callee. An explanation for this nondeterministic choice is given in the next two paragraphs. Second, a pcall with a procedure arm introduces a preemption creating multiple preemption-free execution fragments. This challenge is addressed by two pieces of code. First, we check that lock and slots are updated correctly (lines 30–32) by the preemption-free execution fragment ending before the pcall. Second, we update the snapshot variables (line 42) to enable the verification of the preemption-free execution fragment beginning after the pcall.

Lines 35–37 in the then branch check refinement against the atomic action specification of the caller, exploiting the atomic action specification of the callee. The actual verification is performed in a fresh procedure Check\_Enter\_Enter invoked on line 35. Notice that this procedure depends on both the caller and the callee (indicated in colors), and that it preserves a necessary preemption point. The procedure has input parameters tid to receive the input of the caller (for refinement checking) and x to receive the input of the callee (to generate the behavior of the callee). Furthermore, pc may be updated in Check\_Enter\_Enter and thus passed as both an input and output parameter. In the body of the procedure, the invocation of action ACQUIRE on line 56 overapproximates the behavior of the callee. In the layered concurrent program (Fig. 6), the (recursive) pcall to Enter in the body of Enter is annotated with 1. This annotation indicates that for any execution passing through this pcall, ACQUIRE is deemed to occur during the execution of its unique arm. This is reflected in the checker program by updating done to *true* on line 37; the update is justified because of the assertion in Check\_Enter\_Enter at line 58. If the pcall being translated was instead unannotated, line 37 would be omitted.

Lines 39–40 in the else branch ensure that using the atomic action specification of the callee on line 56 is justified. Allowing the execution to continue to the callee ensures that the called procedure is invoked in all states allowed by P1. However, the execution is blocked once the call returns to ensure that downstream code sees the side-effect on pc and the snapshot variables.

To summarize, the crux of our instrumentation of procedure arms is to combine refinement checking of caller and callee. We explore the behaviors of the callee to check its refinement. At the same time, we exploit the atomic action specification of the callee to check refinement of the caller.

**Instrumenting Unannotated Procedure Arms.** Procedure Enter illustrates the instrumentation of an annotated procedure arm. The instrumentation of an unannotated procedure arm (both in an annotated or unannotated pcall) is simpler, because we only need to check that the tracked state is not modified. For such an arm to a procedure refining atomic action Action, we introduce a procedure Check\_Action (which is independent of the caller) comprising three instructions: take snapshots, pcall A, and assert !\*CHANGED\*.

**Pcalls with Multiple Arms.** Our examples show the instrumentation of pcalls with a single arm. Handling multiple arms is straightforward, since each arm is translated independently. Atomic action arms stay unmodified, annotated procedure arms are replaced with the corresponding Check\_Caller\_Callee procedure, and unannotated procedure arms are replaced with the corresponding Check\_Action procedure.

**Output Parameters.** Our examples illustrate refinement checking for atomic actions that have no output parameters. In general, a procedure and its atomic action specification may return values in output parameters. We handle this generalization but lack of space does not allow us to present the technical details.

### **5 Conclusion**

In this paper, we presented layered concurrent programs, a programming notation to succinctly capture a multi-layered refinement proof capable of connecting a deeply-detailed implementation to a highly-abstract specification. We presented an algorithm to extract from the concurrent layered program the individual concurrent programs, from the most concrete to the most abstract. We also presented an algorithm to extract a collection of refinement checker programs that establish the connection among the sequence of concurrent programs encoded by the layered concurrent program. The cooperative safety of the checker programs and the preemptive safety of the most abstract concurrent program suffices to prove the preemptive safety of the most concrete concurrent program.

Layered programs have been implemented in Civl, a deductive verifier for concurrent programs, implemented as a conservative extension to the Boogie verifier [3]. Civl has been used to verify a complex concurrent garbage collector [6] and a state-of-the-art data-race detection algorithm [15]. In addition to these two large benchmarks, around fifty smaller programs (including a ticket lock and a lock-free stack) are available at https://github.com/boogie-org/boogie.

There are several directions for future work. We did not discuss how to verify an individual checker program. Civl uses the Owicki-Gries method [13] and relyguarantee reasoning [8] to verify checker programs. But researchers are exploring many different techniques for verification of concurrent programs. It would be interesting to investigate whether heterogeneous techniques could be brought to bear on checker programs at different layers.

In this paper, we focused exclusively on verification and did not discuss code generation, an essential aspect of any programming system targeting the construction of verified programs. There is a lot of work to be done in connecting the most concrete program in a concurrent layered program to executable code. Most likely, different execution platforms will impose different obligations on the most concrete program and the general idea of layered concurrent programs would be specialized for different target platforms.

Scalable verification is a challenge as the size of programs being verified increases. Traditionally, scalability has been addressed using modular verification techniques but only for single-layer programs. It would be interesting to explore modularity techniques for concurrent layered programs in the context of a refinement-oriented proof system.

Layered concurrent programs bring new challenges and opportunities to the design of programming languages and development environments. Integrating layers into a programming language requires intuitive syntax to specify layer information and atomic actions. For example, ordered layer names can be more readable and easier to refactor than layer numbers. An integrated development environment could provide different views of the layered concurrent program. For example, it could show the concurrent program, the checker program, and the introduced code at a particular layer. Any updates made in these views should be automatically reflected back into the layered concurrent program.

**Acknowledgements.** We thank Hana Chockler, Stephen Freund, Thomas A. Henzinger, Viktor Toman, and James R. Wilcox for comments that improved this paper. This research was supported in part by the Austrian Science Fund (FWF) under grants S11402-N23 (RiSE/SHiNE) and Z211-N23 (Wittgenstein Award).

#### **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

# Model Checking

### **Propositional Dynamic Logic for Higher-Order Functional Programs**

Yuki Satake(B) and Hiroshi Unno

University of Tsukuba, Tsukuba, Japan {satake,uhiro}@logic.cs.tsukuba.ac.jp

**Abstract.** We present an extension of propositional dynamic logic called HOT-PDL for specifying temporal properties of higher-order functional programs. The semantics of HOT-PDL is defined over Higher-Order Traces (HOTs) that model execution traces of higher-order programs. A HOT is a sequence of events such as function calls and returns, equipped with two kinds of pointers inspired by the notion of justification pointers from game semantics: one for capturing the correspondence between call and return events, and the other for capturing higher-order control flow involving a function that is passed to or returned by a higherorder function. To allow traversal of the new kinds of pointers, HOT-PDL extends PDL with new path expressions. The extension enables HOT-PDL to specify interesting properties of higher-order programs, including stack-based access control properties and those definable using dependent refinement types. We show that HOT-PDL model checking of higher-order functional programs over bounded integers is decidable via a reduction to modal μ-calculus model checking of higher-order recursion schemes.

#### **1 Introduction**

Temporal verification of higher-order programs has been an emerging research topic [12,14,18,22–24,26,27,31,34]. The specification languages used there are (ω-)regular word languages (that subsume LTL) [12,18,26] and modal μ-calculus (that subsumes CTL) [14,24,31], which are interpreted over sequences or trees consisting of events. (Extended) dependent refinement types are also used to specify temporal [23,27] and branching properties [34]. These specification languages, however, cannot sufficiently express specifications of control flow involving (higher-order) functions. For example, let us consider the following simple higher-order program <sup>D</sup>tw (in OCaml syntax):

let tw f x = f (f x) in let inc x = x + 1 in let r = \* in tw inc r

Here, ∗ denotes a non-deterministic integer, and the higher-order function tw : (int → int) → int → int applies its function argument f : int → int to the integer argument <sup>x</sup> twice. For example, for <sup>r</sup> = 0, the program <sup>D</sup>tw exhibits the following call-by-value reduction sequence (with the redexes underlined).

tw inc <sup>0</sup> −→ (λx.inc (inc <sup>x</sup>)) 0 −→ inc (inc 0) −→<sup>∗</sup> inc <sup>1</sup> −→<sup>∗</sup> <sup>2</sup>

c The Author(s) 2018

H. Chockler and G. Weissenbacher (Eds.): CAV 2018, LNCS 10981, pp. 105–123, 2018. https://doi.org/10.1007/978-3-319-96145-3\_6

Example properties of the program <sup>D</sup>tw that cannot be expressed by the previous specification languages are:

Prop.1. If the function returned by a partial application of tw to some function (e.g., λx.inc (inc x) in the above sequence) is called with some integer n, the function argument passed to tw (i.e., inc) is eventually called with n.

Prop.2. If the function returned by a partial application of tw to some function is never called, then the function argument passed to tw is never called.

To remedy the limitation, we introduce a notion of Higher-Order Trace (HOT) that captures the control flow of higher-order programs and propose a dynamic logic over HOTs called Higher-Order Trace Propositional Dynamic Logic (HOT-PDL) for specifying temporal properties of higher-order programs.

Intuitively, a HOT models a program execution trace which is a possibly infinite sequence of events such as function calls and returns with information about actual arguments and return values. Furthermore, HOTs are equipped with two kinds of pointers to enable precise specification of control flow: one for capturing the correspondence between call and return events, and the other for capturing higher-order control flow involving a function that is passed to or returned by a higher-order function. The two kinds of pointers are inspired by the notion of justification pointers from the game semantics of PCF [1,2,19,20].

For the higher-order program Dtw, for <sup>r</sup> = 0, we get the following HOT <sup>G</sup>tw: 1

Here, • represents some function value, **call**(f,v) represents a call event of the function f with the argument v, and **ret**(f,v) represents a return event of the function f with the return value v. This trace corresponds to the previous reduction sequence: the call events **call**(tw, •), **call**(•, 0), **call**(•, 0), and **call**(•, 1) that occur in the trace in this order correspond respectively to the redexes tw inc, (λx.inc (inc x)) 0, inc 0, and inc 1. The three important points here are that (1) the call events have pointers labeled with **CR** to the corresponding return events **ret**(tw, •), **ret**(•, 2), **ret**(•, 1), and **ret**(•, 2), (2) the call event **call**(tw, •) has two pointers labeled with **CC**, where • represents the function argument f of tw and the pointed call events **call**(•, 0) and **call**(•, 1) represent the two calls to <sup>f</sup> in tw, and (3) the return event **ret**(tw, •) has a pointer labeled with **RC**, where • represents the partially-applied function λx.inc (inc x) and the pointed call event **call**(•, 0) represents the call to the function.

To allow traversal of the pointers, HOT-PDL extends propositional dynamic logic with new path expressions (see Sect. 3 for details). The extension enables

<sup>1</sup> The symbol ··· indicates the omission of a subsequence. The two omitted subse-**CR CR**

quences are **call**(inc, 0) **ret**(inc, 1) and **call**(inc, 1) **ret**(inc, 2) in this order.

HOT-PDL to specify interesting properties of higher-order programs, including stack-based access control properties and those definable using dependent refinement types. Here, stack-based access control is a security mechanism implemented in runtimes like JVM for ensuring secure execution of programs that have components with different levels of trust: the mechanism ensures that a *security-critical* function (e.g., file access) is invoked only if all the (immediate and indirect) callers in the current call stack are *trusted*, or one of the callers is a *privileged* function and its callees are all *trusted*. We introduce a new variant of stack-based access control properties for higher-order programs, formalized in HOT-PDL from the point of view of interactions among callers and callees.

Compared to the previous specification languages with respect to the expressiveness, HOT-PDL subsumes (ω-)regular languages because PDL interpreted over words is already as expressive as them [15]. Temporal logics over nested words [6] such as CaRet [5] and NWTL [4] can capture the correspondence between call and return events (i.e., pointers labeled with **CR**) but cannot capture higher-order control flow (i.e., pointers labeled with **CC** and **RC**). Branching properties (expressible in, e.g., CTL), however, are out of the scope of the present paper, and such an extension of HOT-PDL remains an interesting future direction. Dependent refinement types are often used to specify properties of higher-order programs for partial- and total-correctness verification [29,33,39,40]. For example, the following properties of the program <sup>D</sup>tw are expressible:


This paper shows that HOT-PDL can encode such dependent refinement types.

We also study HOT-PDL model checking: given a higher-order program D over bounded integers and a HOT-PDL formula φ, the problem is to decide whether φ is satisfied by all the execution traces of D modeled as HOTs. We show the decidability of HOT-PDL model checking via a reduction to modal μ-calculus model checking of higher-order recursion schemes [21,28].

The rest of the paper is organized as follows. Section 2 formalizes HOTs and explains how to use them to model execution traces of higher-order functional programs. Section 3 defines the syntax and the semantics of HOT-PDL and Sect. 4 shows how to encode stack-based access control properties and dependent refinement types in HOT-PDL. Section 5 discusses HOT-PDL model checking. We compare HOT-PDL with related work in Sect. 6 and conclude the paper with remarks on future work in Sect. 7. Omitted proofs are given in the extended version of this paper [30].

#### **2 Higher-Order Traces**

This section defines the notion of Higher-Order Trace (HOT), which is used to model execution traces of higher-order programs. To this end, we first define (Σ,Γ)*-labeled directed graphs* and *DAGs*.

**Definition 1 (**(Σ,Γ)**-labeled directed graphs).** *Let* Σ *be a finite set of node labels and* Γ *be a finite set of edge labels. A* (Σ,Γ)-labeled directed graph *is defined as a triple* (V, λ, ν)*, where* V *is a countable set of nodes,* λ : V <sup>→</sup> Σ *is a node labeling function, and* ν : V <sup>×</sup>V <sup>→</sup> <sup>2</sup><sup>Γ</sup> *is an edge labeling function. We call <sup>a</sup>* (Σ,Γ)*-labeled directed graph that has no directed cycle* (Σ,Γ)*-labeled DAG.*

Note that an edge may have multiple labels. For nodes u, u <sup>∈</sup> <sup>V</sup> , <sup>ν</sup>(u, u ) = ∅ means that there is no edge from u to u . We use σ and γ as meta-variables ranging respectively over <sup>Σ</sup> and <sup>Γ</sup>. We write <sup>V</sup><sup>σ</sup> for the set {<sup>u</sup> <sup>∈</sup> <sup>V</sup> <sup>|</sup> <sup>σ</sup> <sup>=</sup> <sup>λ</sup>(u)} of all the nodes labeled with <sup>σ</sup>. We also write <sup>V</sup><sup>Σ</sup> for the set - <sup>σ</sup>∈<sup>Σ</sup> <sup>V</sup><sup>σ</sup>. For u, u <sup>∈</sup> <sup>V</sup> , we write <sup>u</sup> <sup>≺</sup><sup>γ</sup> <sup>u</sup> if <sup>γ</sup> <sup>∈</sup> <sup>ν</sup>(u, u ). A binary relation <sup>≺</sup><sup>+</sup> <sup>γ</sup> (resp. ≺<sup>∗</sup> γ) denotes the transitive (resp. reflexive and transitive) closure of ≺γ.

**Definition 2 (HOTs).** *A HOT is a* (Σ,Γ)*-DAG,* G = (V, λ, ν) *that satisfies:*


Intuitively, <sup>Σ</sup>**call** (resp. <sup>Σ</sup>**ret**) represents a set of call (resp. return) events. ΣT **call** (resp. <sup>Σ</sup><sup>A</sup> **call**) represents a set of call events of top-level functions (resp. functions that are returned by or passed to (higher-order) functions). <sup>u</sup> <sup>≺</sup>**<sup>N</sup>** <sup>u</sup> means that <sup>u</sup> is the next event of <sup>u</sup> in the trace. <sup>u</sup> <sup>≺</sup>**CR** <sup>u</sup> indicates that <sup>u</sup> is the return event corresponding to the call event <sup>u</sup>. <sup>u</sup> <sup>≺</sup>**CC** <sup>u</sup> represents that <sup>u</sup> is a call event of the function argument passed at the call event <sup>u</sup>. <sup>u</sup> <sup>≺</sup>**RC** <sup>u</sup> means that u is a call event of the partially-applied function returned at the return event <sup>u</sup>. We call the minimum node of a HOT <sup>G</sup> with respect to <sup>≺</sup>**<sup>N</sup>** the *root node*, denoted by 0G. For HOTs <sup>G</sup><sup>1</sup> and <sup>G</sup><sup>2</sup>, we say <sup>G</sup><sup>1</sup> is a *prefix* of <sup>G</sup><sup>2</sup> and write <sup>G</sup><sup>1</sup> <sup>G</sup><sup>2</sup>, if <sup>G</sup><sup>1</sup> is a sub-graph of <sup>G</sup><sup>2</sup> such that 0<sup>G</sup><sup>1</sup> = 0<sup>G</sup><sup>2</sup> . Note that the HOT <sup>G</sup>tw in Sect. 1, where **N**-labeled edges are omitted, satisfies the above conditions, with {**call**(tw, •), **call**(inc, 0), **call**(inc, 1)} ⊆ Σ<sup>T</sup> **call**, {**call**(•, 0), **call**(•, 1)} ⊆ Σ<sup>A</sup> **call**, and {**ret**(tw, •), **ret**(inc, 1), **ret**(inc, 2), **ret**(•, 1), **ret**(•, 2)} ⊆ Σ**ret**.

#### **2.1 Trace Semantics for Higher-Order Functional Programs**

We now formalize our target language L, which is an ML-like typed call-by-value higher-order functional language. The syntax is defined by

$$\begin{array}{l} \text{(programs)} \ D ::= \{ f\_1 \mapsto \lambda x. e\_1, \dots, f\_m \mapsto \lambda x. e\_m \} \\ \text{(expresses)} \quad e ::= x \mid f \mid \lambda x. e \mid e\_1 \, e\_2 \mid n \mid \mathbf{op}(e\_1, e\_2) \mid \text{if } \mathbf{z} \, e\_1 \, e\_2 \, e\_3 \} \\ \text{(values)} \quad v ::= f \mid \lambda x. e \mid n \\ \text{(types)} \quad \tau ::= \mathbf{int} \mid \tau\_1 \to \tau\_2 \end{array}$$

Here, x and f are meta-variables ranging respectively over term variables and names of top-level functions. The meta-variable n ranges over the set of bounded integers <sup>Z</sup><sup>b</sup> <sup>=</sup> {n**min**, ··· , n**max**} ⊂ <sup>Z</sup>. For simplicity of presentation, <sup>L</sup> has the type int of bounded integers as the only base type. op represents binary operators such as +, <sup>−</sup>, <sup>×</sup>, =, and >. The binary relations = and > return an integer that encodes a boolean value (e.g., 1 for true and 0 for false). A program D maps each top-level function name <sup>f</sup><sup>i</sup> to its definition λx.ei. We write dom(D) for {f<sup>1</sup>,...,fm}. We assume that <sup>D</sup> has the main function main of the type int <sup>→</sup> int. The functions in D can be mutually recursive. Expressions e comprise variables x, function names f, lambda abstractions λx.e, function applications <sup>e</sup><sup>1</sup> <sup>e</sup>2, bounded integers <sup>n</sup>, binary operations op(v1, v2), and conditional branches ifz <sup>e</sup><sup>1</sup> <sup>e</sup><sup>2</sup> <sup>e</sup><sup>3</sup>. We assume that expressions are simply-typed. As usual, the simple type system guarantees that an evaluation of a typed expression never causes a runtime type mismatch like 1 +λx.x. An expression ifz <sup>e</sup><sup>1</sup> <sup>e</sup><sup>2</sup> <sup>e</sup><sup>3</sup> evaluates to <sup>e</sup><sup>2</sup> (resp. <sup>e</sup><sup>3</sup>) if <sup>e</sup><sup>1</sup> evaluates to 0 (resp. a non-zero integer). For example, the program <sup>D</sup>tw in Sect. <sup>1</sup> is defined in <sup>L</sup> as follows:

$$D\_{\texttt{tw}} \overset{\triangle}{=} \{ \texttt{tw} \mapsto \lambda f.\lambda x.f.\;(f\;x), \texttt{inc} \mapsto \lambda x.x + 1, \texttt{main} \mapsto \lambda r. \texttt{tw} \; \texttt{inc} \; r\},$$

$$\begin{array}{lcl} \text{(configurations)} & C ::= (I, E[e])\\ \text{(eval. contexts)} & E ::= \left[ \right] \mid E \; e \mid v \; E \mid \mathsf{op}(E, e) \mid \mathsf{op}(v, E) \mid \mathsf{ifz} \; E \; e \; e\_1 \; e\_2 \mid \mathsf{ret}(h, i, E)\\ \text{(interfaces)} & I ::= \left\{ h\_1 \stackrel{i\_1}{\mapsto} v\_1, \dots, h\_m \stackrel{i\_m}{\mapsto} v\_m \right\} \\ \text{(handles)} & h ::= n \mid f \mid \left\lfloor h\_i \right\rfloor \mid \left\lceil h \right\rceil\_i\\ \text{(events)} & \alpha ::= \mathsf{call}(h\_1, i, h\_2) \mid \mathsf{ret}(h\_1, i, h\_2) \end{array}$$

**Fig. 1.** Labeled transition relations ( - <sup>=</sup>⇒) and ( <sup>π</sup> =⇒) for L

**Fig. 2.** Example trace of Dtw

We now introduce a trace semantics of the language L, which will be used in Sect. 5 to define our model checking problems of higher-order programs. In the trace semantics, a program execution trace is represented by a sequence of function call and return events without an explicit representation of pointers but with enough information to construct them. We will explain how to model traces of L as HOTs by presenting a translation.

The trace semantics [[D]] of the language <sup>L</sup> is defined as [[D]]fin <sup>∪</sup> [[D]]inf where [[D]]fin <sup>=</sup> (I, main <sup>n</sup>) <sup>=</sup><sup>⇒</sup> C and [[D]]inf <sup>=</sup> π (I, main <sup>n</sup>) <sup>π</sup> <sup>=</sup>⇒ ⊥ are respectively the sets of *finite* and *infinite* execution traces obtained by evaluating main n for some integer n using *trace-labeled* multi-step reduction relations <sup>=</sup><sup>⇒</sup> and <sup>π</sup> <sup>=</sup>⇒, which are presented in Fig. 1, under the program <sup>I</sup> <sup>=</sup> f 0 <sup>→</sup> v (f <sup>→</sup> v) <sup>∈</sup> D annotated with the number of calls to each function occurred so far (i.e., initialized to 0). There, we use (resp. <sup>π</sup>) as a meta-variable ranging over finite sequences <sup>α</sup><sup>1</sup> ··· <sup>α</sup><sup>m</sup> (resp. infinite sequences <sup>α</sup><sup>1</sup> · <sup>α</sup><sup>2</sup> ···) of events <sup>α</sup>i. We write for the empty sequence, <sup>1</sup> · <sup>2</sup> for the concatenation of the sequences <sup>1</sup> and <sup>2</sup>, and <sup>|</sup><sup>|</sup> for the length of . An *event* <sup>α</sup> is either of the form **call**(h<sup>1</sup>, i, h<sup>2</sup>) or **ret**(h<sup>1</sup>, i, h<sup>2</sup>), where a *handle* h represents a top-level function or a runtime value exchanged among functions. An event **call**(h<sup>1</sup>, i, h<sup>2</sup>) represents the (<sup>i</sup> + 1)th call to the function <sup>h</sup><sup>1</sup> with the argument <sup>h</sup><sup>2</sup>. On the other hand, an event **ret**(h<sup>1</sup>, i, h<sup>2</sup>) represents the return of the (i + 1)th call to the function <sup>h</sup><sup>1</sup> with the return value <sup>h</sup><sup>2</sup>. We thus equip call and return events of <sup>h</sup><sup>1</sup> with the information about (1) the number <sup>i</sup> of the calls to <sup>h</sup><sup>1</sup> occurred so far and (2) the runtime value <sup>h</sup><sup>2</sup> passed to or returned by <sup>h</sup><sup>1</sup>, so that we can construct pointers (see Definition <sup>3</sup> for details). Note here that handles h are also equipped with meta-information necessary for constructing pointers. More specifically, h is any of the following: a bounded integer n, a top-level function name <sup>f</sup> <sup>∈</sup> dom(D), the special identifier h<sup>i</sup> for the function argument of the (<sup>i</sup> + 1)th call to the higher-order function <sup>h</sup>, or the special identifier h<sup>i</sup> for the partially-applied function returned by the (i+1)th call to h. We thus use handles to track for each function value where it is constructed and how many times it is called. We shall assume that the syntax of expressions e and values v is also extended with handles <sup>h</sup>. As we have seen, the finite traces [[D]]fin of a program <sup>D</sup> are collected using the *terminating* trace-labeled multi-step reduction relation <sup>=</sup><sup>⇒</sup> on configurations. A *configuration* (I,E[e]) is a pair of an interface I and an expression E[e] consisting of an evaluation context E and a sub-expression e under evaluation. A special evaluation context ret(h, i, E) represents the calling context of the (<sup>i</sup> + 1)th call to h that waits for the return value computed by E. An *interface* I is defined to be h1 i1 <sup>→</sup> <sup>v</sup>1,...,h<sup>m</sup> i*<sup>m</sup>* <sup>→</sup> <sup>v</sup><sup>m</sup> that maps each function handle <sup>h</sup><sup>j</sup> to its definition <sup>v</sup><sup>j</sup> , where <sup>i</sup><sup>j</sup> records the number of calls to the function <sup>h</sup><sup>j</sup> occurred so far. In the derivation rules for −→, [[op]] represents the integer function denoted by op, and I h i <sup>→</sup> v represents the interface obtained from I by adding (or replacing existing assignment to h with) the assignment h i <sup>→</sup> v. In the rule CInt (resp. RInt) for function calls (resp. returns) with an integer n, the reduction relation is labeled with **call**(h, i, n) (resp. **ret**(h, i, n)). By contrast, in the rule CFun (resp. RFun) for function calls (resp. returns) with a function value <sup>v</sup>, the special identifier h<sup>i</sup> (resp. hi) for <sup>v</sup> is used in the label **call**(h, i, hi) (resp. **ret**(h, i, hi)) of the reduction relation, and v in the expression is replaced by the identifier. For example, as shown in Fig. 2, the following finite trace tw is generated from the program <sup>D</sup>tw:

**call**(main, <sup>0</sup>, 0) · **call**(tw, <sup>0</sup>, tw0) · **ret**(tw, <sup>0</sup>, tw0) · **call**(tw<sup>0</sup> , <sup>0</sup>, 0)· **call**(tw<sup>0</sup> , <sup>0</sup>, 0) · **call**(inc, <sup>0</sup>, 0) · **ret**(inc, <sup>0</sup>, 1) · **ret**(tw<sup>0</sup> , <sup>0</sup>, 1) · **call**(tw<sup>0</sup> , <sup>1</sup>, 1)· **call**(inc, <sup>1</sup>, 1) · **ret**(inc, <sup>1</sup>, 2) · **ret**(tw<sup>0</sup> , <sup>1</sup>, 2) · **ret**(tw<sup>0</sup> , <sup>0</sup>, 2) · **ret**(main, <sup>0</sup>, 2)

Similarly, the infinite traces [[D]]inf of a program <sup>D</sup> are collected using the *nonterminating* trace-labeled reduction relation C <sup>π</sup> =⇒ ⊥ on configurations. Intuitively, C <sup>π</sup> <sup>=</sup>⇒ ⊥ means that an execution from the configuration C diverges, producing an infinite event sequence <sup>π</sup>. In the rule Tranω, the double horizontal line represents that the rule is interpreted co-inductively.

We now define the translation from traces [[D]]fin to HOTs with <sup>Σ</sup><sup>T</sup> **call** = {**call**(f,n), **call**(f, •) <sup>|</sup> <sup>f</sup> <sup>∈</sup> dom(D), n <sup>∈</sup> <sup>Z</sup>b}, <sup>Σ</sup><sup>A</sup> **call** <sup>=</sup> {**call**(•, n), **call**(•, •) <sup>|</sup> <sup>n</sup> <sup>∈</sup> <sup>Z</sup>b}, and <sup>Σ</sup>**ret** <sup>=</sup> {**ret**(f,n), **ret**(f, •), **ret**(•, n), **ret**(•, •) <sup>|</sup> <sup>f</sup> <sup>∈</sup> dom(D), n <sup>∈</sup> <sup>Z</sup>b}. We shall write Σ(D) for Σ<sup>T</sup> **call** <sup>∪</sup> <sup>Σ</sup><sup>A</sup> **call** <sup>∪</sup> <sup>Σ</sup>**ret**. Note that <sup>Σ</sup>(D) is finite because dom(D) and <sup>Z</sup><sup>b</sup> are finite. We write <sup>|</sup>α<sup>|</sup> for the element of <sup>Σ</sup>(D) obtained from the event <sup>α</sup> by dropping the second argument and replacing h<sup>i</sup> and h<sup>i</sup> by •. For example, we get <sup>|</sup>**call**(tw, <sup>0</sup>, tw0)<sup>|</sup> <sup>=</sup> **call**(tw, •).

**Definition 3 (Finite Traces to HOTs).** *Given a finite trace* <sup>=</sup> <sup>α</sup><sup>1</sup> ··· <sup>α</sup><sup>m</sup> <sup>∈</sup> [[D]]fin *with* m > <sup>0</sup>*, the corresponding HOT* <sup>G</sup> = (V, λ, ν) *is defined by:*

	- <sup>j</sup><sup>1</sup> <sup>≺</sup>**<sup>N</sup>** <sup>j</sup><sup>2</sup> *if* <sup>j</sup><sup>2</sup> <sup>=</sup> <sup>j</sup><sup>1</sup> + 1*,*
	- <sup>j</sup><sup>1</sup> <sup>≺</sup>**CR** <sup>j</sup><sup>2</sup> *if* <sup>∃</sup>h, h , h, i. α<sup>j</sup><sup>1</sup> <sup>=</sup> **call**(h, i, h ) <sup>∧</sup> <sup>α</sup><sup>j</sup><sup>2</sup> <sup>=</sup> **ret**(h, i, h)*,*
	- <sup>j</sup><sup>1</sup> <sup>≺</sup>**CC** <sup>j</sup><sup>2</sup> *if* <sup>∃</sup>h, h , h, i, i . α<sup>j</sup><sup>1</sup> <sup>=</sup> **call**(h , i, h) <sup>∧</sup> <sup>α</sup><sup>j</sup><sup>2</sup> <sup>=</sup> **call**(h, i , h)*,*
	- <sup>j</sup><sup>1</sup> <sup>≺</sup>**RC** <sup>j</sup><sup>2</sup> *if* <sup>∃</sup>h, h , h, i, i . α<sup>j</sup><sup>1</sup> <sup>=</sup> **ret**(h , i, h) <sup>∧</sup> <sup>α</sup><sup>j</sup><sup>2</sup> <sup>=</sup> **call**(h, i , h)*.*

For example, the HOT <sup>G</sup>tw in Sect. <sup>1</sup> is translated from the finite trace tw defined above (with the call and return events of main omitted).

For an infinite trace <sup>π</sup> <sup>=</sup> <sup>α</sup><sup>1</sup> · <sup>α</sup><sup>2</sup> ··· ∈ [[D]]inf, the HOT <sup>G</sup><sup>π</sup> = (Vπ, λπ, ν<sup>π</sup>) is defined similarly for <sup>V</sup><sup>π</sup> <sup>=</sup> {<sup>j</sup> <sup>∈</sup> <sup>N</sup> <sup>|</sup> <sup>j</sup> <sup>≥</sup> <sup>1</sup>} and <sup>λ</sup><sup>π</sup> <sup>=</sup> {<sup>j</sup> → |α<sup>j</sup> | | <sup>j</sup> <sup>∈</sup> <sup>V</sup><sup>π</sup>}.

#### **3 Propositional Dynamic Logic over Higher-Order Traces**

This section presents HOT-PDL, a propositional dynamic logic (PDL) defined over HOTs (see [16] for a general exposition of PDL). HOT-PDL extends path expressions of PDL with →**ret** and →**call** for traversing edges of HOTs labeled respectively with **CR** and **CC**/**RC**. The syntax is defined by:

$$\begin{array}{l} \text{(forms)} \quad \phi ::= p \mid \phi\_1 \land \phi\_2 \mid \neg \phi \mid [\pi] \,\phi\\ \text{(path expressions)} \quad \pi ::= \to \mid \to\_{\text{call}} \mid \to\_{\text{ret}} \mid \{\phi\} ? \mid \pi\_1 \cdot \pi\_2 \mid \pi\_1 + \pi\_2 \mid \pi^\* \end{array}$$

Here, p is a meta-variable ranging over atomic propositions AP. Let and <sup>⊥</sup> denote tautology and contradiction, respectively. Path expressions π are defined using a syntax based on regular expressions: we have concatenation <sup>π</sup><sup>1</sup> · <sup>π</sup><sup>2</sup>, alternation <sup>π</sup><sup>1</sup> <sup>+</sup>π<sup>2</sup>, and Kleene star <sup>π</sup><sup>∗</sup>. We write <sup>π</sup><sup>+</sup> for <sup>π</sup> ·π<sup>∗</sup>. Path expressions →, →**ret**, and →**call** are for traversing edges labeled with **N**, **CR**, and **CC** or **RC**, respectively. A path expression {φ}? is for testing if φ holds at the current node. A formula [π] φ means that φ always holds if one moves along any path represented by the path expression π. The dual formula π φ is defined by <sup>¬</sup>[π]¬φ and means that there is a path represented by π such that φ holds if one moves along the path. π and [π] have the same priority as <sup>¬</sup>.

We now define the semantics of HOT-PDL. For a given HOT G = (V, λ, ν) with Σ <sup>=</sup> AP, λ(u) represents the atomic proposition satisfied at the node <sup>u</sup> <sup>∈</sup> <sup>V</sup> . We define the semantics [[φ]]<sup>G</sup> of a formula <sup>φ</sup> as the set of all nodes <sup>u</sup> <sup>∈</sup> <sup>V</sup>

**Fig. 3.** The pairs of nodes in Gtw related by **CR** or <sup>F</sup>

where <sup>φ</sup> is satisfied, and the semantics [[π]]<sup>G</sup> of a path expression <sup>π</sup> as the set of all pairs (u1, u<sup>2</sup>) <sup>∈</sup> <sup>V</sup> <sup>×</sup> <sup>V</sup> such that one can move along <sup>π</sup> from <sup>u</sup><sup>1</sup> to <sup>u</sup><sup>2</sup>.

$$\begin{aligned} [p]\_G &= \{ u \in V \mid p = \lambda(u) \} \quad [\phi\_1 \wedge \phi\_2]\_G = [\phi\_1]\_G \cap [\phi\_2]\_G \; [\neg \phi]\_G = V \; [\![\phi] \!]\_G \\ [\![\pi] \,\phi]\_G &= \{ u \in V \mid \forall u'. ((u, u') \in [\pi]\_G \Rightarrow u' \in [\phi]\_G) \} \\ [\![\neg \}]\_G &= \neg \mathsf{N} \qquad [\![\neg \}\_{\mathsf{ret}}]\_G = \neg \mathsf{CR} \qquad [\rightarrow\_{\mathsf{call}}]\_G = \prec\_{\mathsf{CC}} \cup \neg\_{\mathsf{RC}} \\ [\![\phi] \circ]!\_G &= \{ (u, u) \in V \times V \mid u \in [\phi]\_G \} \\ [\![\pi\_1 \cdot \pi\_2]\_G &= \{ (u\_1, u\_3) \in V \times V \mid \exists u\_2 \in V. (u\_1, u\_2) \in [\pi\_1]\_G \land (u\_2, u\_3) \in [\pi\_2]\_G \} \\ [\![\pi\_1 + \pi\_2]\_G &= [\pi\_1]\_G \cup [\pi\_2]\_G \qquad [\![\pi^\*] \,]\_G \quad [\![\pi^\*] \,]\_G = \bigcup\_{m \ge 0} \{ \pi\}\_G^m \end{aligned}$$

Here, for a binary relation R, R<sup>m</sup> denotes the m-th power of R. Note that this semantics can interpret a given HOT-PDL formula over both finite and infinite HOTs. [[p]]<sup>G</sup> consists of all nodes labeled by <sup>p</sup>. [[[π] <sup>φ</sup>]]<sup>G</sup> contains all nodes from which we always reach to a node in [[φ]]<sup>G</sup> if we take a path represented by <sup>π</sup>. [[→]]G, [[→**ret**]]G, and [[→**call**]]<sup>G</sup> contain the pairs of nodes linked by an edge labeled by **N**, **CR**, and **CC** or **RC**, respectively. We write <sup>G</sup> <sup>|</sup><sup>=</sup> <sup>φ</sup> if 0<sup>G</sup> <sup>∈</sup> [[φ]]G. For example, let us consider the HOT <sup>G</sup>tw and AP <sup>=</sup> <sup>Σ</sup>(Dtw). Then, [[→ **ret**(tw, •)]]<sup>G</sup>tw consists of the node labeled by **call**(tw, •). [[→**ret ret**(•, 2)]]<sup>G</sup>tw consists of a node labeled by **call**(•, 0) and the node labeled by **call**(•, 1). [[→**call call**(•, 0)]]<sup>G</sup>tw consists of the two nodes respectively labeled by **call**(tw, •) and **ret**(tw, •). The example properties of <sup>D</sup>tw discussed in Sect. <sup>1</sup> can be expressed as follows:

$$\begin{array}{l} \text{Prop.1.:} \ [\rightarrow^\*] \bigwedge\_{x \in \mathbb{Z}\_b} \left( (\mathsf{call}(\mathsf{tw}, \bullet) \wedge \langle \rightarrow\_{\mathsf{ret}} \cdot \rightarrow \mathsf{call}) \, \mathsf{call}(\bullet, x)) \Rightarrow \langle \rightarrow\_{\mathsf{call}} \rangle \, \mathsf{call}(\bullet, x) \right) \\\text{Prop.2.:} \ [\rightarrow^\*] \left( (\mathsf{call}(\mathsf{tw}, \bullet) \wedge \neg \langle \rightarrow\_{\mathsf{ret}} \cdot \rightarrow \mathsf{call}) \, \top) \Rightarrow \neg \langle \rightarrow\_{\mathsf{call}} \rangle \, \top \right) \end{array}$$

Here, <sup>x</sup>∈Z*<sup>b</sup>* <sup>φ</sup> abbreviates [n**min**/x] <sup>φ</sup> ∧···∧ [n**max**/x] <sup>φ</sup>.

In Sect. 4, we show further examples that express interesting properties of higher-order programs, including stack based access control properties and those

**Fig. 4.** The pairs of nodes in Gtw related by **CR**, **CC**, **RC**, or <sup>H</sup>

definable using dependent refinement types. We here prepare notations used there. First, we overload the symbols Σ**call**, <sup>Σ</sup>**ret**, and <sup>Σ</sup><sup>T</sup> **call** to denote the path expressions { Σ**call**}?, { Σ**ret**}?, and Σ<sup>T</sup> **call** ?, respectively. We write →<sup>F</sup> for the path expression →**ret**· →, which is used to move from a call event to the next event of the caller (by skipping to the next event of the corresponding return event). We also write <sup>F</sup> for the path expression <sup>Σ</sup>**call**·→·→<sup>∗</sup> <sup>F</sup> ·Σ**call**, which is used to move from a call event to any call event invoked by the callee. Figure <sup>3</sup> illustrates the pairs of nodes in <sup>G</sup>tw related by <sup>F</sup> . To capture control flow of higher-order programs, where function callers and callees may exchange functions as values, we need to use **CC**- and **RC**-labeled edges. For example, an event raised by the function argument <sup>f</sup>*arg* of a higher-order function <sup>f</sup> could be regarded as an event of the caller <sup>g</sup> of <sup>f</sup>, because <sup>f</sup>*arg* is constructed by <sup>g</sup>. Similarly, an event raised by the (partially-applied) function <sup>f</sup>*ret* returned by a function f could be regarded as an event of f. To formalize the idea, we introduce variants →<sup>H</sup> and <sup>H</sup> of →<sup>F</sup> and <sup>F</sup> with higher-order control flow taken into consideration: →<sup>H</sup> denotes (→**ret**· →)+(→**call**· →) and <sup>F</sup> denotes ΣT **call**·→·→<sup>∗</sup> <sup>H</sup> ·Σ<sup>T</sup> **call**. Note that the source and the target of <sup>H</sup> are restricted to call events of top-level functions. Figure <sup>4</sup> illustrates the pairs of nodes in <sup>G</sup>tw related by <sup>H</sup>, where nodes labeled with events of the same function (in the sense discussed above) are arranged in the same horizontal line.

#### **4 Applications of HOT-PDL**

We show how to encode dependent refinement types and stack-based access control properties using HOT-PDL.

#### **4.1 Dependent Refinement Types**

HOT-PDL can specify pre- and post-conditions of higher-order functions, by encoding dependent refinement types τ for partial [29,33,40] and total [23,27,34, 36,39] correctness verification, defined as: τ ::= {ν <sup>|</sup> ψ} | (x : τ<sup>1</sup>) <sup>→</sup> <sup>τ</sup> <sup>Q</sup> <sup>2</sup> . Here, <sup>Q</sup> is either <sup>∀</sup> or <sup>∃</sup>. An integer refinement type {ν <sup>|</sup> ψ} is the type of bounded integers ν that satisfy the refinement formula ψ over bounded integers. A dependent function type (x : τ<sup>1</sup>) <sup>→</sup> <sup>τ</sup><sup>∀</sup> <sup>2</sup> is the type of functions that, for any argument <sup>x</sup> conforming to the type τ<sup>1</sup>, *if terminating*, return a value conforming to the type τ<sup>2</sup>. By contrast, (x : τ<sup>1</sup>) <sup>→</sup> τ <sup>∃</sup> <sup>2</sup> is the type of functions that, for any argument x conforming to τ<sup>1</sup>, *always terminate* and return a value conforming to <sup>τ</sup><sup>2</sup>. For example, Prop.3 and Prop.4 of <sup>D</sup>tw are expressed by the following types of tw:

$$\begin{aligned} \text{Prop.3.: } (f: (x: \mathsf{int}) \to \{\nu \mid \nu > x\}^{\forall}) &\to \left( (x: \mathsf{int}) \to \{\nu \mid \nu > x\}^{\forall} \right)^{\forall} \\ \text{Prop.4.: } (f: (x: \mathsf{int}) \to \mathsf{int}^{\exists}) &\to \left( (x: \mathsf{int}) \to \mathsf{int}^{\exists} \right)^{\forall} \end{aligned}$$

We here write int for {ν | }. These types can be encoded in HOT-PDL as:

$$\begin{array}{l} \text{Prop.3.} \colon \mathsf{call}(\mathsf{tw}, \bullet) \Rightarrow \left( \left[ \multimap \mathsf{l} \right] \mathsf{incrr}(\bullet) \right) \land \left[ \multimap \mathsf{t} \right] \left( \mathsf{ret}(\mathsf{tw}, \bullet) \Rightarrow \left[ \multimap \mathsf{l} \right] \mathsf{incrr}(\bullet) \right) \\ \mathsf{Prop.4.} \colon \mathsf{call}(\mathsf{tw}, \bullet) \Rightarrow \left( \left[ \multimap \mathsf{l} \right] \mathsf{letr}(\bullet) \right) \land \left[ \multimap \mathsf{t} \right] \left( \mathsf{ret}(\mathsf{tw}, \bullet) \Rightarrow \left[ \multimap \mathsf{l} \right] \mathsf{letr}(\bullet) \right) \end{array}$$

Here, incr(g) = <sup>x</sup>∈Z*<sup>b</sup>* **call**(g, x) <sup>⇒</sup> [→**ret**] <sup>y</sup>∈Z*<sup>b</sup>* (**ret**(g, y) <sup>⇒</sup> y>x) and term(g) = <sup>x</sup>∈Z*<sup>b</sup>* (**call**(g, x) ⇒ →**ret** ) for <sup>g</sup> ∈ {•} ∪ {<sup>f</sup> <sup>|</sup> <sup>f</sup> <sup>∈</sup> dom(D)}. We now define a translation F from types to HOT-PDL formulas as follows:

$$F(g,(x:\tau\_1)\to\tau\_2^Q) = \bigwedge\_{x \in [\tau\_1]} \left( \text{call}(g,x) \Rightarrow F\_{\text{arg}}(x,\tau\_1) \land F\_{\text{ret}}(g,\tau\_2^Q) \right)$$

$$|(x:\tau\_1)\to\tau\_2^Q| = \{\bullet\} \qquad \qquad \qquad |\left\{ x \mid \psi \right\}| = \mathbb{Z}\_b$$

$$F\_{\text{arg}}(\bullet,\tau) = \left[ \rightarrow\_{\text{call}} \right] F(\bullet,\tau) \qquad F\_{\text{arg}}(n,\{x \mid \psi\}) = \begin{cases} \top & \text{(if } \mid = \lceil n/x \vert \psi \rceil \\ \bot & \text{(if } \not\models \lceil n/x \vert \psi \rceil) \end{cases}$$

$$F\_{\text{ret}}(g,\tau^\forall) = \left[ \rightarrow\_{\text{ret}} \right] \bigwedge\_{x \in [\tau]} \left( \text{ret}(g,x) \Rightarrow F(x,\tau) \right)$$

$$F\_{\text{ret}}(g,\tau^\exists) = \left( \left( \rightarrow\_{\text{ret}} \right) \top \right) \land F\_{\text{ret}}(g,\tau^\forall)$$

#### **4.2 Stack-Based Access Control Properties**

As briefly summarized in Sect. 1, stack-based access control [13] ensures that a *security-critical* function (e.g., file access) is invoked only if all the (immediate and indirect) callers in the current call stack are *trusted*, or one of the callers is a *privileged* function and its callees are all *trusted*. We here use HOT-PDL to specify stack-based access control properties for higher-order programs. Let **Critical**, **Trusted**, and **Priv** be HOT-PDL formulas that tell whether the current node is labeled with a call event of security-critical, trusted, and privileged functions, respectively. We assume that **Critical**, **Priv**, and ¬**Trusted** do not overlap each other, and a function in **Priv** can be directly called only from a function in **Trusted**. Then, one may think we can express the specification as:

$$\neg \left< \angle^\*\_F \cdot \{ \neg \mathbf{Trusted} \} \right> \\ ? \cdot \left( \angle^\*\_F \cdot \{ \neg \mathbf{Priv} \} \right)^+ \\ \backslash \mathbf{Critical} $$

Here, the path expression <sup>F</sup> introduced in Sect. 3 is used to traverse the call stack bottom-up. The above formula says that an invalid call stack never occurs, where a call stack is called *invalid* if it contains a call to an untrusted function (represented by the part <sup>∗</sup> <sup>F</sup> {¬**Trusted**}?), followed by a call to a critical function (represented by **Critical**), with no intervening call to a privileged function (represented by (<sup>F</sup> · {¬**Priv**}?)<sup>+</sup>).

This definition, however, is not sufficient for our higher-order language. Let us consider the following program <sup>D</sup>*pa* , which involves a partial application:

> let untrusted () = λu.critical u let main () = untrusted () ()

Here, untrusted ∈ **Trusted** and critical <sup>∈</sup> **Critical**. Intuitively, <sup>D</sup>*pa* should be regarded as *unsafe* because critical in the body of untrusted is called. However, <sup>D</sup>*pa* satisfies the specification above (under the assumption that anonymous functions are in **Trusted**), because the partial application untrusted () never causes a call to critical but just returns the anonymous (and trusted) function λu.critical <sup>u</sup>. The following higher-order program <sup>D</sup>*ho* is yet another unsafe example that satisfies the specification:

```
let privileged f = f ()
   let trusted f = if test () then privileged f else ()
let untrusted () = trusted (λx.crash (); critical ())
      let main () = untrusted ()
```
Here, privileged ∈ **Priv**, trusted ∈ **Trusted**, untrusted ∈ **Trusted**, and critical ∈ **Critical**. Note that critical in the body of untrusted is called as follows: the anonymous function λx.crash (); critical () is first passed to trusted and then to privileged (if test () returns true), and is finally called by privileged, causing a call to critical.

To remedy the limitation, we introduce a new refined variant of stack-based access control properties for higher-order programs, formalized in HOT-PDL from the point of view of interactions among callers and callees as follows:

$$\neg \left< \angle^\*\_H \cdot \{ \neg \mathbf{Trusted} \} ? \cdot (\angle^\*\_H \cdot \{ \neg \mathbf{Priv} \} ?)^+ \right> \mathbf{Critical}$$

Note that this is obtained from the previous version by just replacing <sup>F</sup> with <sup>H</sup>, which takes into account which function constructed each function value exchanged among functions. The refined version rejects the unsafe <sup>D</sup>*pa* and <sup>D</sup>*ho* as intended: <sup>D</sup>*pa* (resp. <sup>D</sup>*ho*) is rejected because the call event of λu.critical <sup>u</sup> (resp. λx.crash (); critical ()) is regarded as an event of untrusted.

Fournet and Gordon [13] have studied variants of stack-based access control properties for a call-by-value higher-order language. We conclude this section by comparing ours with one of theirs called "stack inspection with frame capture".<sup>2</sup> The ideas behind the two are similar but what follows illustrates the difference:

```
let untrusted f = crash (); f ()
  let trusted x = untrusted (λx.if test () then critical () else ())
     let main () = trusted ()
```
This program satisfies ours but violates theirs. Note that ours allows a function originally constructed by a trusted function to invoke a critical function even if the function is passed around by an untrusted function. By contrast, in their definition, a trusted function value gets "contaminated" (i.e., disabled to invoke a critical function) once it is passed to or returned by an untrusted function. In some cases, their conservative policy is useful, but we believe ours would be more semantically robust (e.g., even works well with the CPS transformation).

#### **5 HOT-PDL Model Checking**

In this section, we define HOT-PDL model checking problems for higher-order functional programs over bounded integers and sketch a proof of the decidability.

**Definition 4 (HOT-PDL model checking).** *Given a program* D *and a HOT-PDL formula* φ *with* AP <sup>=</sup> Σ(D)*, HOT-PDL model checking is the problem of deciding whether* <sup>G</sup> <sup>|</sup><sup>=</sup> <sup>φ</sup> *and* <sup>G</sup><sup>π</sup> <sup>|</sup><sup>=</sup> <sup>φ</sup> *for all* <sup>∈</sup> [[D]]fin *and* <sup>π</sup> <sup>∈</sup> [[D]]inf*.*

**Theorem 1 (Decidability).** *HOT-PDL model checking is decidable.*

We show this by a reduction to modal μ-calculus (μ-ML) model checking of higher-order recursion schemes (HORSs), which is known decidable [21,28]. A HORS is a grammar for generating a (possibly infinite) ranked tree, and HORSs are essentially simply-typed lambda calculus with general recursion, tree constructors, and finite data domains such as booleans and bounded integers.

In the reduction, we encode the set of HOTs that are generated from the given program D as a single tree (generated by a HORS). For example, Fig. <sup>5</sup> shows such a tree that encodes the HOTs of <sup>D</sup>tw. <sup>3</sup> There, a node labeled with end represents the termination of the program. Note that the branching at the root node is due to the input to the function main. The subtree with the root node labeled with **call**(main, 0) is obtained from the HOT <sup>G</sup>tw by appending a special node labeled with end, adding, for each edge with the label γ ∈ {**N**, **CR**, **CC**, **RC**}, a new node labeled with γ, and expanding the resulting DAG into a tree. Thus, the edge labels of <sup>G</sup>tw are turned into node labels of the tree.

<sup>2</sup> We do not compare with the other variants in [13] because they are too syntactic to be preserved by simple program transformations like inlining.

<sup>3</sup> There, for simplicity, we illustrate an *unranked* tree and omit the label of branching nodes. In the formalization, we express an unranked tree as a binary tree using a special node label **br** of the arity 2 representing a binary branching.

It is also worth mentioning here that we are allowed to expand DAGs into trees because the truth value of a HOT-PDL formula is not affected by nodesharing in the given HOT. This nice property is lost if we extend the path expressions of HOT-PDL, for example, with intersections. Thus, the decidability of model checking for extensions of HOT-PDL is an open problem.

**Fig. 5.** A tree encoding the HOTs generated from Dtw

We next explain our translation from a HOT-PDL formula into a μ-ML formula interpreted over trees that encode HOTs. Our translation is based on an existing one for ordinary PDL [11]. The syntax of μ-ML is defined as follows:

$$\varphi ::= X \mid p \mid \neg \varphi \mid \varphi \land \varphi \mid \Box \varphi \mid \nu X. \varphi \mid \mu X. \varphi \mid$$

Here, X represents a propositional variable and p represents an atomic proposition. A formula ϕ means that ϕ holds for any child of the current node. A formula μX.ϕ (resp. νX.ϕ) represents the least (resp. greatest) fixpoint of the function λX.ϕ. Here, we assume X occurs only positively in ϕ. For example, the HOT-PDL formulas [→] p, [→**ret**] <sup>p</sup>, and [→**call**] <sup>p</sup> are respectively translated to μ-ML formulas: -(νX.(**<sup>N</sup>** <sup>⇒</sup> p) <sup>∧</sup> (**br** <sup>⇒</sup> -X)), -(νX.(**CR** <sup>⇒</sup> p) <sup>∧</sup> (**br** <sup>⇒</sup> -X)) , and -(νX.((**CC** <sup>∨</sup> **RC**) <sup>⇒</sup> p) <sup>∧</sup> (**br** <sup>⇒</sup> -X)), where the greatest fixpoints are used to skip the branching nodes labeled with **br** (that may repeat infinitely).

Finally, we explain how to obtain a HORS for generating a tree that encodes the set of HOTs generated from the given program D. We here need to simulate pointer traversals of HOT-PDL by using purely functional features of HORSs because μ-ML does not support pointers. Intuitively, we obtain the desired HORS from D by embedding an event monitor and an event handler. Whenever the monitor detects a function call or return event during the execution of D, the handler creates a new node labeled with the event or ignores the event until a certain event is detected by the monitor, depending on the current mode of the handler. The handler has the following three modes:

<sup>m</sup>**<sup>N</sup>**: The handler always creates and links two new nodes <sup>u</sup>**<sup>N</sup>** and <sup>u</sup><sup>α</sup> labeled respectively with **<sup>N</sup>** and the event α observed. The handler then continues as follows, depending on the form of the event α:


of the modes <sup>m</sup>**<sup>N</sup>** and <sup>m</sup>**call** continue to create subtrees of <sup>u</sup>α.


For simplicity of the construction, we assume that D is in the Continuation-Passing Style (CPS). This does not lose generality because we can enforce this form by the CPS transformation. Because CPS explicates the order of function call and return events, it simplifies event monitoring, handling, and tracking of the current mode of the monitors, which often changes as monitoring proceeds.

#### **6 Related Work**

HOT-PDL can specify temporal trace properties of higher-order programs. An extension for specifying branching properties, however, remains a future work.

There have been proposed logics and formal languages on richer structures than words. Regular languages of nested words, or equivalently, Visibly Pushdown Languages (VPLs) have been introduced by Alur and Madhusudan [7]. An (ω-)nested word is a (possibly infinite) word with additional well-nested pointers from call events to the corresponding return events. Compared to temporal logics CaRet [5] and NWTL [4] over (ω-)nested words, HOT-PDL is defined over HOTs that have richer structures. Recall that a HOT is equipped with two kinds of pointers: one kind with the label **CR**, which is the same as the pointers of nested words, and the other kind with the label **CC** or **RC**, which is newly introduced to capture higher-order control flow. Bollig et al. proposed nested traces as a generalization of nested words for modeling traces of concurrent (first-order) recursive programs, and presented temporal logics over nested traces [8]. Nested traces, however, cannot model traces of higher-order programs. We expect a combination of our work with theirs enables us to specify temporal trace properties of concurrent and higher-order recursive programs. Cyriac et al. have recently introduced an extension of PDL defined over traces of *order-2* collapsible pushdown systems (CPDS) [3]. Interestingly, their traces are also equipped with two kinds of pointers: one kind of pointers captures the correspondence between ordinary push and pop stack operations, and the other captures the correspondence between order-2 push and pop operations for second-order stacks. Our work deals with higher-order programs that correspond to order-n CPDS for arbitrary n.

Finally, we compare HOT-PDL with existing logics defined over words. It is well known that LTL is less expressive than ω-regular languages [38]. To remedy the limitation of LTL, Wolper introduced ETL [38] that allows users to define new temporal operators using right-linear grammars. Henriksen and Thiagarajan proposed DLTL [17] that generalizes the until operator of LTL using regular expressions. Leucker and S´anchez proposed RLTL [25] that combines LTL and regular expressions. Vardi and Giacomo have introduced Linear Dynamic Logic (LDL), a variant of PDL interpreted over infinite words [15,35]. LDL<sup>f</sup> , a variant of PDL interpreted over finite words, has also been studied in [15]. ETL, DLTL, RLTL, and LDL are as expressive as ω-regular languages. Note that HOT-PDL subsumes (ω-)regular languages because LDL and LDL<sup>f</sup> can be naturally embedded in HOT-PDL. (ω-)VPLs strictly subsume (ω-)regular languages. Though CaRet [5] and NWTL [4] are defined over nested words, they do not capture the full class of VPLs [10]. To remedy the limitation, VLTL [10] combines LTL and VRE [9] in the style of RLTL, where VRE is a generalization of regular expressions for VPLs. VLDL [37] extends LDL by replacing the path expressions with VPLs over finite words. VLTL and VLDL exactly characterize ω-VPLs. Because VPLs and HOT-PDL are incomparable, it remains future work to extend HOT-PDL to subsume (ω-)VPLs.

#### **7 Conclusion and Future Work**

We have presented HOT-PDL, an extension of PDL defined over HOTs that model execution traces of call-by-value and higher-order programs. HOT-PDL enables a precise specification of temporal trace properties of higher-order programs and consequently provides a foundation for specification in various application domains including stack-based access control and dependent refinement types. We have also studied HOT-PDL model checking and presented a reduction method to modal μ-calculus model checking of higher-order recursion schemes.

To further widen the scope of our approach, it is worth investigating how to adapt HOTs and HOT-PDL to call-by-name and/or effectful languages. To this end, it is natural to incorporate more ideas from achievements of game semantics [1,20,32] and extend HOTs with new kinds of events and pointers for capturing call-by-name and/or effectful computations.

**Acknowledgments.** We would like to thank anonymous referees for their useful comments. This work was supported by JSPS KAKENHI Grant Numbers 15H05706, 16H05856, 17H01720, and 17H01723.

#### **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

### **Syntax-Guided Termination Analysis**

Grigory Fedyukovich(B) , Yueling Zhang, and Aarti Gupta

Princeton University, Princeton, USA {grigoryf,yuelingz,aartig}@cs.princeton.edu

**Abstract.** We present new algorithms for proving program termination and non-termination using syntax-guided synthesis. They exploit the symbolic encoding of programs and automatically construct a formal grammar for symbolic constraints that are used to synthesize either a termination argument or a non-terminating program refinement. The constraints are then added back to the program encoding, and an offthe-shelf constraint solver decides on their fitness and on the progress of the algorithms. The evaluation of our implementation, called Freq-Term, shows that although the formal grammar is limited to the syntax of the program, in the majority of cases our algorithms are effective and fast. Importantly, FreqTerm is competitive with state-of-the-art on a wide range of terminating and non-terminating benchmarks, and it significantly outperforms state-of-the-art on proving non-termination of a class of programs arising from large-scale Event-Condition-Action systems.

#### **1 Introduction**

Originated from the field of program synthesis, an approach of syntax-guided synthesis (SyGuS) [2] has recently been applied [14,16] to verification of program safety. In general, a SyGuS-based method walks through a set of candidates, restricted by a formal grammar, and searches for a candidate that meets the predetermined specification. The distinguishing insight of [14,16], in which SyGuS discovers inductive invariants, is that a formal grammar need not necessarily be provided by the user (as in applications to program synthesis), but instead it could be automatically constructed on the fly from the symbolic encoding of the program being analyzed. Despite being incomplete, the approach shows remarkable practical success due to its ability to discover various facts about program behaviors whose syntactic representations are compact and look similar to the actual program statements.

Problems of proving and disproving program termination have a known connection to safety verification, e.g., [7,19,28,39,40]. In particular, to prove termination, a program could be augmented by a counter (or a set of counters) that is

This work was supported in part by NSF Grant 1525936.

Y. Zhang—Visiting Student Research Collaborator from East China Normal University, China.

c The Author(s) 2018

H. Chockler and G. Weissenbacher (Eds.): CAV 2018, LNCS 10981, pp. 124–143, 2018. https://doi.org/10.1007/978-3-319-96145-3\_7

initially assigned a reasonably large value and monotonically decreases at each iteration [38]. It remains to solve a safety verification task: to prove that the counter never goes negative. On the other hand, to prove that a program has only infinite traces, one could prove that the negation of a loop guard is never reachable, which boils down to another safety verification task. This knowledge motivates us not only to exploit safety verification as a subroutine in our techniques, but also to adapt successful methods across application domains.

We present a set of SyGuS-based algorithms for proving and disproving termination. For the former, our algorithm LinRank adds a decrementing counter to a loop, iteratively guesses lower bounds on its initial value (using the syntactic patterns obtained from the code), which lead to the safety verification tasks to be solved by an off-the-shelf Horn solver. Existence of an inductive invariant guarantees termination, and the algorithm converges. Otherwise LinRank proceeds to strengthening the lower bounds by adding another guess. Similarly, our algorithm LexRank deals with a system of extra counters ordered lexicographically and thus enables termination analysis for a wider class of programs.

For proving non-termination, we present a novel algorithm NontermRef that iteratively searches for a restriction on the loop guard, that *might lead* to infinite traces. Since safety verification cannot in general answer such queries, we build NontermRef on top of a solver for the validity of ∀∃-formulas. In particular, we prove that if at the beginning of any iteration the desired restriction is fulfilled, then there exists a sequence of states from the beginning to the end of that iteration, and the desired restriction is fulfilled at the end of that iteration as well. Recent symbolic techniques [15] to handle quantifier alternation enabled us to prove non-termination of a large class of programs for which a reduction to safety verification is not effective.

These three algorithms are independent of each other, but they all rely on a generator of constraints that are further applied in different contexts. This distinguishes our work from most of the related approaches [7,18,20,23,30,32, 36,39,40]. The key insight, adapted from [14,16], is that the syntactical structures that appear in the program give rise to a formal grammar, from which many candidates could be sampled. Because the grammar is composed from a finite number of numeric constants, operators, and variable combinations, the number of sampled constraints is always finite. Furthermore, since our samples are syntactically close to the actual constructs which appear in the code, they often provide a practical guidance towards the proof of the task. Thus in the majority of cases, the algorithms converge with the successful result.

We have implemented our algorithms in a tool called FreqTerm, which utilizes solvers for Satisfiability Modulo Theory (SMT) [11,15] and satisfiability of constrained Horn clauses [16,24,26]. These automatic provers become more robust and powerful every day, which affects performance of FreqTerm only positively. We have evaluated FreqTerm on a range of terminating and nonterminating programs taken from SVCOMP<sup>1</sup> and on large-scale benchmarks

<sup>1</sup> Software Verification Competition, http://sv-comp.sosy-lab.org/.

arising from Event-Condition-Action systems<sup>2</sup> (ECA). Compared to state-ofthe-art termination analyzers [18,22,30], FreqTerm exhibits a competitive runtime, and achieves several orders of magnitude performance improvement while proving non-termination of ECAs.

In the rest of the paper, we give background on automated verification (Sect. 2) and on SyGuS (Sect. 3); then we describe the application of SyGuS for proving termination (Sect. 4) and non-termination (Sect. 5). Finally, after reporting experimental results (Sect. 6), we overview related work (Sect. 7) and conclude the paper (Sect. 8).

#### **2 Background and Notation**

In this work, we formulate tasks arising in automated program analysis by encoding them to instances of the SMT problem [12]: for a given first-order formula ϕ and a background theory to decide whether there is an assignment m of values from the theory to variables in ϕ that makes ϕ true (denoted m |= ϕ). If every assignment to ϕ is also an assignment to some formula ψ, we write ϕ =⇒ ψ.

**Definition 1.** *A* transition system *P is a tuple V* ∪ *V* ,*Init*, *Tr , where V is a vector of variables; V is its primed copy; formulas Init and Tr encode the* initial states *and the* transition relation *respectively.*

We view *programs* as *transition systems* and throughout the paper use both terms interchangeably. An assignment s of values to all variables in *V* (or any copy of *V* such as *V* ) is called a *state*. A trace is a (possibly infinite) sequence of states s, s ,... , such that (1) <sup>s</sup> <sup>|</sup><sup>=</sup> *Init*, and (2) for each <sup>i</sup>, <sup>s</sup>(*i*) , s(*i*+1) <sup>|</sup><sup>=</sup> *Tr* .

We assume, without loss of generality, that the transition-relation formula *Tr* (*V* , *V* ) is in Conjunctive Normal Form, and we split *Tr* (*V* , *V* ) to a conjunction *Guard*(*V* ) ∧ *Body*(*V* , *V* ), where *Guard*(*V* ) is the maximal subset of conjuncts of *Tr* expressed over variables just from *V* , and every conjunct of *Body*(*V* , *V* ) can have appearances of variables from *V* and *V* .

Intuitively, formula *Guard*(*V* ) encodes a loop guard of the program, whose loop body is encoded in *Body*(*V* , *V* ). For example, for a program shown in Fig. 1a, *V* = {x, y, K}, the *Guard* = y<K ∨ y>K, and the entire encoding of the transition relation is shown in Fig. 1b.

**Definition 2.** *If each program trace contains a state* s*, such that* s |= ¬*Guard , then the program is called* terminating *(otherwise, it is called* non-terminating*).*

Tasks of proving termination and non-termination are often reduced to tasks of proving program safety. A *safety verification task* is a pair P,*Err* , where P = *V* ∪ *V* ,*Init*, *Tr* is a program, and *Err* is an encoding of the *error states*. It has a solution if there exists a formula, called a *safe inductive invariant*, that implies *Init*, is closed under *Tr* , and is inconsistent with *Err* .

<sup>2</sup> Provided at http://rers-challenge.org/2012/index.php?page=problems.

**Fig. 1.** (a): C-code; (b): transition relation *Tr* (in the framebox – *Guard*); (c): formulas S extracted from *Tr* and normalized; (d): grammar that generalizes S.

**Definition 3.** *Let* P = *V* ∪ *V* ,*Init*, *Tr ; a formula Inv is a* safe inductive invariant *if the following conditions hold:* (1) *Init*(*V* ) =⇒ *Inv*(*V* )*,* (2) *Inv*(*V* )∧ *Tr* (*V* , *V* ) =⇒ *Inv*(*V* )*, and* (3) *Inv*(*V* ) ∧ *Err* (*V* ) =⇒ ⊥*.*

If there exists a trace c (called a *counterexample*) that contains a state s, such that s |= *Err* , then the safety verification task does not have a solution.

#### **3 Exploiting Program Syntax**

The key driver of our termination and non-termination provers is a generator of constraints which help to analyze the given program in different ways. The source code often gives useful information, e.g., of occurrences of variables, constants, arithmetic and comparison operators, that could bootstrap the formula generator. We rely on the SyGuS-based algorithm [16] introduced for verifying program safety. It automatically constructs the grammar G based on the fixed set of formulas S obtained by traversing parse trees of *Init*, *Tr* , and *Err* . In our case, *Err* is not given, so G is based only on *Init* and *Tr* .

For simplicity, we require formulas in S to have the form of inequalities composed from a linear combination over either *V* or *V* and a constant (e.g., x < y + 1 is included, but x = x+ 1 is excluded). Then, if needed, variables are deprimed (e.g., x < y +1 is replaced by x<y+1), and formulas are normalized, such that all terms are moved to the left side (e.g., x<y + 1 is replaced by x − y − 1 < 0), the subtraction is rewritten as addition, < is rewritten as >, and respectively ≤ as ≥ (e.g., x − y − 1 < 0 is replaced by (−1) · x + y + 1 > 0).

The entire process of creation of G is exemplified in Fig. 1. Production rules of G are constructed as follows: (1) the production rule for normalized inequalities

$$\begin{aligned} \text{(\overline{3})} \quad &Inv(x, y, i, K) \land (y < K \lor y > K) \land i < 0 \implies \bot \\ &\text{(b)} \end{aligned} \tag{\text{b)}}$$

**Fig. 2.** (a): The worst-case dynamics of program from Fig. 1a; (b): the terminationargument validity check (in the frameboxes – lower bounds {*j*} for <sup>i</sup>).

(denoted ineq) consists of choices corresponding to distinct types of inequalities in S, (2) the production rule for linear combinations (denoted sum) consists of choices corresponding to distinct arities of inequalities in S, (3) production rules for variables, coefficients, and constants (denoted respectively var, coef, and const) consist of choices corresponding respectively to distinct variables, coefficients, and constants that occur in inequalities in S. Note that the method of creation of G naturally extends to considering disjunctions and nonlinear arithmetic [16].

Choices in production rules of grammar G can be further assigned probabilities based on frequencies of certain syntactic features (e.g., frequencies of particular constants or combinations of variables) that belong to the program's symbolic encoding. In the interest of saving space, we do not discuss it here and refer the reader to [16]. The generation of formulas from G is performed recursively by sampling from probability distributions assigned to rules. Note that the choice of distributions affects only the order in which formulas are sampled and does not affect which formulas *can* or *cannot* be sampled in principle (because the grammar is fixed). Thus, without loss of generality, it is sound to assume that all distributions are uniform. In the context of termination analysis, we are interested in formulas produced by rules ineq and sum.

#### **4 Proving Termination**

We start this section with a motivating example and then proceed to presenting the general-purpose algorithms for proving program termination.

*Example 1.* The program shown in Fig. 1a terminates. It operates on three integer variables, x, y, and K: in each iteration y gets closer to x, and x gets closer **Algorithm 1.** LinRank(P): proving termination with linear termination argument

**Input**: P = -*V* ∪ *V* - ,*Init*, *Tr* where *Tr* = *Guard* ∧ *Body* **Output**: *res* ∈ terminates, unknown **<sup>1</sup>** *V* ← *V* ∪ {i}; *V* - ← *V* - ∪ {i - }; **<sup>2</sup>** *Tr* ← *Tr* ∧ i - = i − 1; *Err* ← *Guard* ∧ i < 0; **<sup>3</sup>** <sup>G</sup> <sup>←</sup> getGrammarAndDistributions(*Init*, *Tr*); **<sup>4</sup> while** canSample(G) **do <sup>5</sup>** *cand* <sup>←</sup> sample(G, sum); **<sup>6</sup>** <sup>G</sup> <sup>←</sup> adjust(G, *cand*); **<sup>7</sup> if** *Init* =⇒ i > *cand* **then continue**; **<sup>8</sup>** *Init* ← *Init* ∧ i > *cand*; **<sup>9</sup> if** isSafe(*Init*, *Tr*,*Err*) **then return** terminates; **<sup>10</sup> return** unknown;

to K. Thus, the total number of values taken by y before it equals K is no bigger than the maximal distance among x, y, and K (in the following, denoted *Max* ). The worst-case dynamics happens when initially x<y<K (shown in Fig. 2a), in other cases the program terminates even faster. To formally prove this, the program could be augmented by a so-called *termination argument*. For this example, it is simply a fresh variable i which is initially assigned *Max* (or any other value greater than *Max* ) and which gets decremented by one in each iteration. The goal now is to prove that i never gets negative. Fig. 2b shows the encoding of this safety verification task (recall Definition 3). The existence of a solution to this task guarantees the safety of the augmented program, and thus, the termination of the original program. Most state-of-the-art Horn solvers are able to find a solution immediately.

The main challenge in preparing the termination-argument validity check is the generation of lower bounds {*j*} for i in *Init* (e.g., conjunctions of the form i>*<sup>j</sup>* in ① in Fig. 2b). We build on the insight that each *<sup>j</sup>* could be constructed independently from the others, and then an inequality i>*<sup>j</sup>* could be conjoined with *Init*, thus giving rise to a new safety verification task. For a generation of candidate inequalities, we utilize the algorithm from Sect. 3: all {*j*} can be sampled from grammar G which is obtained in advance from *Init* and *Tr* .

For example, all six formulas in ① in Fig. 2b: <sup>x</sup><sup>−</sup> K, K <sup>−</sup>x, y <sup>−</sup> K, K <sup>−</sup>y, x<sup>−</sup> y, and y − x belong to the grammar shown in Fig. 1d. Note that for proving termination it is not necessary to have the most precise lower bounds. Intuitively, the larger the initial value of i, the more iterations it will stay positive. Thus, it is sound to try formulas which are not even related to actual lower bounds at all and keep them conjoined with *Init*.

#### **4.1 Synthesizing Linear Termination Arguments**

Algorithm 1 shows an "*enumerate-and-try*" procedure to search for a linear termination argument that proves termination of a program P. To initialize this search, the algorithm introduces an extra counter variable i and adds it to *V* (respectively, its primed copy i gets added to *V* ) (line 1).<sup>3</sup> Then the transitionrelation formula *Tr* gets augmented by i = i−1, the decrement of the counter in the loop body. To specify a set of error states, Algorithm 1 introduces a formula *Err* (line 2): whenever the loop guard is satisfied and the value of counter i is negative. Algorithm 1 then starts searching for large enough lower bounds for i (i.e., a set of constraints over *V* ∪ {i} to be added to *Init*), such that no error state is ever reachable.

Before the main loop of our synthesis procedure starts, various formulas are extracted from the symbolic encoding of P and generalized to a formal grammar (line 3). The grammar is used for an iterative probabilistic sampling of candidate formulas (line 5) that are further added to the validity check of the current termination argument (line 8). In particular, each new constraint over i has the form i>*cand*, where *cand* is produced by the sum production rule described in Sect. 3. Once *Init* is strengthened by this constraint, a new safety verification condition is compiled and checked (line 9) by an off-the-shelf Horn solver.

As a result of each safety check, either a formula satisfying Definition 3 or a counterexample *cex* witnessing reachability of an error state is generated. Existence of an inductive invariant guarantees that the conjunction of all synthesized lower bounds for i is large enough to prove termination, and thus Algorithm 1 converges. Otherwise, if grammar G still contains a formula that has not been considered yet, the synthesis loop iterates.

For the progress of the algorithm, it must keep track of the strength of each new candidate *cand*. That is, *cand* should add more restrictions on i in *Init*. Otherwise, the outcome of the validity check (line 9) would be the same as in the previous iteration. For this reason, Algorithm 1 includes an important routine [16]: after each sampled candidate *cand*, it adjusts the probability distributions associated with the grammar, such that *cand* could not be sampled again in the future iterations (line 6). Additionally, it checks (line 7) if a new constraint adds some value over the already accepted constraints. Consequently, our algorithm does not require explicit handing of counterexamples: if in each iteration *Init* gets only stronger then current *cex* is invalidated. While in principle the algorithm could explicitly store *cex* and check its consistency with each new *cand*, however in our experiments it did not lead to significant performance gains.

#### **Theorem 1.** *If Algorithm 1 returns* terminates *for program* P*, then* P *terminates.*

Indeed, the verification condition, which is proven safe in the last iteration of Algorithm 1, corresponds to some program P that differs from P by the presence of variable i. The set of traces of P has a one-to-one correspondence with the

<sup>3</sup> Assume that initially set *V* does not contain i.

**Algorithm 2.** LexRank(P): proving termination with lexicographic termination argument

**Input**: P = -*V* ∪ *V* - ,*Init*, *Tr* where *Tr* = *Guard* ∧ *Body* **Output**: *res* ∈ terminates, unknown **<sup>1</sup>** *V* ← *V* ∪ {i, j}; *V* - ← *V* - ∪ {i - , j- }; **<sup>2</sup>** *Err* <sup>←</sup> *Guard* <sup>∧</sup> i < 0; *jBounds* <sup>←</sup> <sup>∅</sup>; **3** G, G- , G-- <sup>←</sup> getGrammarAndDistributions(*Init*, *Tr*); **<sup>4</sup> while** canSample(G) **or** canSample(G- ) **or** canSample(G--) **do <sup>5</sup> if** nondet() **then <sup>6</sup>** *cand* <sup>←</sup> sample(G, sum); <sup>G</sup> <sup>←</sup> adjust(G, *cand*); **<sup>7</sup>** *Init* ← *Init* ∧ i > *cand*; **<sup>8</sup> if** nondet() **then <sup>9</sup>** *cand* <sup>←</sup> sample(G- , sum); G- <sup>←</sup> adjust(G- , *cand*); **<sup>10</sup>** *Init* ← *Init* ∧ j > *cand*; **<sup>11</sup> if** nondet() **then <sup>12</sup>** *cand* <sup>←</sup> sample(G--, sum); G-- <sup>←</sup> adjust(G--, *cand*); **<sup>13</sup>** *jBounds* ← *jBounds* ∪ {j > *cand*}; **14** *Tr*- ← *Tr* ∧ ite(j > 0, i- = i ∧ j- = j − 1, i- = i − 1 ∧ - *b*∈*jBounds* b); **<sup>15</sup> if** isSafe(*Init*, *Tr*- ,*Err*) **then return** terminates; **<sup>16</sup> return** unknown;

set of traces of P , such that each state reachable in P could be extended by a valuation of i to become a reachable state in P . That is, P terminates iff P terminates, and P terminates by construction: i is initially assigned a reasonably large value, monotonically decreases at each iteration, and never goes negative.

We note that the loop in Algorithm 1 always executes only a finite number of iterations since G is constructed from the finite number of components, and in each iteration it gets adjusted to avoid re-sampling of the same candidates. However, an off-the-shelf Horn solver that checks validity of each candidate might not converge because the safety verification task is undecidable in general. To mitigate this obstacle, our implementation supports several state-of-the-art solvers and provides a flexibility to specify one to use.

#### **4.2 Synthesizing Lexicographic Termination Arguments**

There is a wide class of terminating programs for which no linear termination argument exists. A commonly used approach to handle them is via a search for a so-called lexicographic termination argument that requires introducing two or more extra counters. A SyGuS-based instantiation of such a procedure for two counters is shown in Algorithm 2 (more counters could be handled similarly). Algorithm 2 has a similar structure to Algorithm 1: the initial program gets augmented by counters, formula *Err* is introduced, lower bounds for counters are iteratively sampled and added to *Init* and *Tr* , and the verification condition is checked for safety.

The differences in Algorithm 2 are in how it handles two counters i and j, between which an implicit order is fixed. In particular, *Err* is still expressed over i only, but i gets decremented by one only when j equals zero (line 14). At the same time, j gets updated in each iteration: if it was equal to zero, it gets assigned a value satisfying the conjunction of constraints in an auxiliary set *jBounds*; otherwise it gets decremented by one. Algorithm 2 synthesizes *jBounds* as well as lower bounds for initial conditions over i and j. The sampling proceeds separately from three different grammars (lines 6, 9, and 12), and the samples are used in three different contexts (lines 7, 10, and 13 respectively). Optionally, Algorithm 2 could be parametrized by a synthesis strategy that gives interpretations for each of the nondet() calls (lines 5, 8, and 11 respectively). In the simplest case, each nondet() call is replaced by , which means that in each iteration Algorithm <sup>2</sup> needs to sample from all three grammars. Alternatively, nondet() could be replaced by a method to identify only one grammar per iteration to be sampled from.

**Theorem 2.** *If Algorithm 2 returns* terminates *for program* P*, then* P *terminates.*

The proof sketch for Theorem 2 is similar to the one for Theorem 1: an augmented program P terminates by construction (due to a mapping of values of i, j into ordinals), and its set of traces has a one-to-one correspondence with the set of traces of P.

#### **5 Proving Non-termination**

In this section, we aim at solving the opposite task to the one in Sect. 4, i.e., we wish to witness infinite program traces and thus, to prove program nontermination. However, in contrast to a traditional search for a single infinite trace, it is often easier to search for groups of infinite traces.

**Lemma 1.** *Program* P = *V* ∪*V* ,*Init*, *Tr where Tr* = *Guard* ∧ *Body does not terminate if:*


The lemma distinguishes a class of programs, for which the following holds. First, the loop guard is reachable from the set of initial states. Second, whenever the loop guard is satisfied, there exists a transition to a state in which the loop guard is satisfied again. Therefore, each initial state s, from which the loop guard is reachable, gives rise to at least one infinite trace that starts with s.

Note that for programs with deterministic transition relations (like, e.g., in Fig. 1a), the check of the second condition of Lemma 1 reduces to deciding the

**Fig. 3.** (a): A variant of program from Fig. 1a; (b): the valid ∀∃-formula for its nonterminating refinement (in frameboxes – refined *Guard*-s); (c): an example of a nonterminating dynamics, when value of x (and eventually, y) never gets changed.

satisfiability of a quantifier-free formula since each state can be transitioned to exactly one state. But if the transition relation is non-deterministic, the check reduces to deciding validity of a ∀∃-formula. Although handling quantifiers is in general hard, some recent approaches [15] are particularly tailored to solve this type of queries efficiently.

In practice, the conditions of Lemma 1 are too strict to be fulfilled for an arbitrary program. However, to prove non-termination, it is sufficient to constrain the transition relation as long as it preserves at least one original transition and only then to apply Lemma 1.

**Definition 4.** *Given programs* P = *V* ∪ *V* ,*Init*, *Tr , and* P = *V* ∪ *V* ,*Init*, *Tr , we say that* P *is a* refinement *of* P *if Tr* =⇒ *Tr .*

Intuitively, Definition 4 requires P and P to operate over the same sets of variables and to start from the same initial states. Furthermore, each transition allowed by *Tr* is also allowed by *Tr* . One way to refine P is to restrict *Tr* = *Guard* ∧ *Body* by conjoining either *Guard*, or *Body*, or both with some extra constraints (called *refinement constraints*). In this work, we propose to sample them from our automatically constructed formal grammar (recall Sect. 3).

*Example 2.* Consider a program shown in Fig. 3a. It differs from the one shown in Fig. 1a by a non-deterministic choice in the second ite-statement. That is, y still moves towards x; but x moves towards K only when x>K, and otherwise x may always keep the initial value. The formal grammar generated for this program is the same as shown in Fig. 1d, and it contains constraints x<K and y<K. Lemma 1 does not apply for the program as is, but it does after refining *Guard* with those constraints. In particular, the ∀∃-formula in Fig. 3b is valid, and a witness to its validity is depicted in Fig. 3c: eventually both x and **Algorithm 3.** NontermRef(P): proving non-termination

**Input**: P = -*V* ∪ *V* - ,*Init*, *Tr* where *Tr* = *Guard* ∧ *Body* **Output**: *res* ∈ terminates, does not terminate, unknown **<sup>1</sup> if** *Init*(*<sup>V</sup>* ) <sup>∧</sup> *Guard*(*<sup>V</sup>* ) =⇒ ⊥ **then return** terminates; **<sup>2</sup>** *Tr* <sup>←</sup> *Tr* <sup>∧</sup> getInvs(*Init*, *Tr*); **<sup>3</sup>** <sup>G</sup> <sup>←</sup> getGrammarAndDistributions(*Init*, *Tr*); **<sup>4</sup>** *Refs* <sup>←</sup> <sup>∅</sup>; *Gramms* <sup>←</sup> <sup>∅</sup>; *Gramms*.push(G); **5 while** *true* **do <sup>6</sup> if** ∀*V* . *Guard*(*V* ) ∧ - *r*∈*Refs* r(*V* ) =⇒ ∃*V* - . *Body*(*V* , *V* - ) ∧ *Guard*(*V* - ) ∧ - *r*∈*Refs* r(*V* - ) **then <sup>7</sup> return** does not terminate; **<sup>8</sup>** *cand* ← ; **<sup>9</sup> while** *Guard*(*V* ) ∧ - *r*∈*Refs* r(*V* ) =⇒ *cand*(*V* ) **or** *Init*(*V* ) ∧ *Guard*(*V* ) ∧ *cand*(*V* ) ∧ - *r*∈*Refs* r(*V* ) =⇒ ⊥ **do <sup>10</sup> if** *Refs* <sup>=</sup> <sup>∅</sup> **and** <sup>¬</sup>canSample(G) **then return** unknown; **<sup>11</sup> if** *Refs* <sup>=</sup> <sup>∅</sup> **and** <sup>¬</sup>canSample(G) **then <sup>12</sup>** *Refs*.pop(); **<sup>13</sup>** *Gramms*.pop(); **<sup>14</sup>** *cand* <sup>←</sup>; <sup>G</sup> <sup>←</sup> *Gramms*.top(); **15 continue**; **<sup>16</sup>** *cand* <sup>←</sup> sample(G, ineq); **<sup>17</sup>** <sup>G</sup> <sup>←</sup> adjust(G, *cand*); **<sup>18</sup>** *Refs*.push(*cand*); **<sup>19</sup>** *Gramms*.push(G);

y become equal and always remain smaller than K. Thus, the program does not terminate.

#### **5.1 Synthesizing Non-terminating Refinements**

The algorithm for proving program's non-termination is shown in Algorithm 3. It starts with a simple satisfiability check (line 1) which filters out programs that never reach the loop body (thus they immediately terminate). Then, the transition relation *Tr* gets strengthened by auxiliary inductive invariants obtained with the help of the initial states *Init* (line 2). The algorithm does not impose any specific requirements on the invariants (and it is sound even for a trivial invariant ) and on a method that detects them. In many cases, auxiliary invariants make the algorithm converge faster. Similar to Algorithms 1–2, Algorithm 3 splits *Init* and *Tr* to a set of formulas and generalizes them to a grammar. The difference lies in the type of formulas sampled from the grammar (ineq vs sum) and their use in the synthesis loop: Algorithm 3 treats sampled candidates as *refinement constraints* and attempts to apply Lemma 1 (line 6).

The algorithm maintains a stack of refinement constraints *Refs*. At the first iteration, *Refs* is empty, and thus the algorithm tries to apply Lemma 1 to the original program. For that application, a ∀∃-formula is constructed and checked for validity. Intuitively the formula expresses the ability of *Body* to transition each state which satisfies *Guard* to a state which satisfies *Guard* as well. If the validity of ∀∃-formula is proven, the algorithm converges (line 7). Otherwise, a refinement of P needs to be guessed. Thus, the algorithm samples a new formula (line 16) using the production rule ineq, which is described in Sect. 3, pushes it to *Refs*, and iterates. Note that G permits formulas over *V* only (i.e., to restrict *Guard*), however, in principle it can be extended for sampling formulas over *V* ∪ *V* (thus, to restrict *Body* as well).

For the progress of the algorithm, it must keep track of how each new candidate *cand* corresponds to constraints already belonging to *Refs*. That is, *cand* should not be implied by *Guard* ∧ - *<sup>r</sup>*∈*Refs* r since otherwise the ∀∃-formula in the next iteration would not change. Also, *cand* should not over-constrain the loop guard, and thus it is important to check that after adding *cand* to constraints from *Guard* and *Refs*, the loop guard is still reachable from the initial states. Both these checks are performed before the sampling (line 9). After the sampling, necessary adjustments on the probability distributions, assigned to the production rules of the grammar [16], are applied to ensure the same refinement candidates are not re-sampled again (line 17).

Because by construction G cannot generate conjunctions of constraints, the algorithm handles conjunctions externally. It is useful in case when a single constraint is not enough for application of Lemma 1, and it should be strengthened by another constraint. On the other hand, it also might be needed to withdraw some sampled candidates before converging. For this reason, Algorithm 3 maintains a stack *Gramms* of grammars and handles it synchronously with stack *Refs* (lines 12–14 and 18–19). When all candidates from a grammar were considered and were unsuccessful, the algorithm pops the latest candidate from *Refs* and rolls back to the grammar used in the previous iteration. Additionally, a maximum size of *Refs* can be specified to avoid considering too deep refinements.

#### **Theorem 3.** *If Algorithm 3 returns* does not terminate *for program* P*, then* P *does not terminate.*

Indeed, constraints that belong to *Refs* in the last iteration of the algorithm give rise to a refinement P of P, such that P = *V* ∪ *V* ,*Init*, *Tr* ∧ - *<sup>r</sup>*∈*Refs* r. The satisfiability check (line 9) and the validity check (line 6) passed, which correspond to the conditions of Lemma 1. Thus, P does not terminate, and consequently it has an infinite trace. Finally, since P refines P then all traces (including infinite ones) of P belong to P, and P does not terminate as well.

#### **5.2 Integrating Algorithms Together**

With a few exceptions [30,39], existing algorithms address either the task of proving, or the task of disproving termination. The goal of this paper is to show that both tasks benefit from syntax-guided techniques. While an algorithmic integration of several orthogonal techniques is itself a challenging problem, it is not the focus of our paper. Still, we use a straightforward idea here. Since each presented algorithm has one big loop, an iteration of Algorithm 1 could be followed by an iteration of Algorithm 2 and in turn, by an iteration of Algorithm 3 (i.e., in a lockstep fashion). A positive result obtained by any algorithm forces all remaining algorithms to terminate. Based on our experiments, provided in detail in Sect. 6, the majority of benchmarks were proven either terminating or non-terminating by one of the algorithms within seconds. This justifies why the lockstep execution of all algorithms in practice would not bring a significant overhead.

#### **6 Evaluation**

We have implemented algorithms for proving termination and non-termination in a tool called FreqTerm<sup>4</sup>. It is developed on top of FreqHorn [16], uses it for Horn solving, and supports other Horn solvers, Spacer3 [26] and μZ [24], as well. To solve ∀∃-formulas, FreqTerm uses the AE-VAL tool [15]. All the symbolic reasoning in the end is performed by the Z3 SMT solver [11].

FreqTerm takes as input a program encoded as a system of linear constrained Horn clauses (CHC). It supports any programming language, as long as a translator from it to CHCs exists. For encoding benchmarks to CHCs, we used SeaHorn v.0.1.0-rc3. To the best of our knowledge, FreqTerm is the only (non)-termination prover that supports a selection of Horn solvers in the backend. This allows the prover to leverage advancements in Horn solving easily.

We have compared FreqTerm against AProVE rev. c181f40 [18], Ultimate Automizer v.0.1.23 [22], and HipTNT+ v.1.0 [30]. The rest of the section summarizes three sets of experiments. Sections 6.1 and 6.2 discuss the comparison on small but tricky programs, respectively terminating and non-terminating, which shows that our approach is applicable to a wide range of conceptually challenging problems. In Sect. 6.3, we target several large-scale benchmarks and show that FreqTerm is capable of significant pushing the boundaries of termination and non-termination proving. In total, we considered 856 benchmarks of various size and complexity. All experiments were conducted on a Linux SMP machine, Intel(R) Xeon(R) CPU E5-2680 v4 @ 2.40 GHz, 56 CPUs, 377 GB RAM.

#### **6.1 Performance on Terminating Benchmarks**

We considered **171** terminating programs<sup>5</sup> from the Termination category of SVCOMP and programs crafted by ourselves. Altogether, four tools in our experiment were able to prove termination of 168 of them within a timeout of 60 s and

<sup>4</sup> The source code of the tool is publicly available at https://goo.gl/HecBWc.

<sup>5</sup> These benchmarks are available at https://goo.gl/MPimXE.

**Fig. 4.** FreqTerm vs respectively Ultimate Automizer, AProVE, and HipTNT+.

left only three programs without a verdict. AProVE verified 76 benchmarks, HipTNT+ 90 (including 3 that no other tool solved), Ultimate Automizer 105 (including 4 that no other tool solved). FreqTerm, implementing Algorithms 1–2 and relying on different solvers verified in total **155** (including **30** that no other tool solved). In particular, Algorithm 1 instantiated with Spacer3, proved termination of 88 programs, with μZ 79, and with FreqHorn 80. Algorithm 2 instantiated with Spacer3, proved termination of 92 programs, with μZ 109, and with FreqHorn 74.

A scatterplot with logarithmic scale on the axes in Fig. 4(a) shows comparisons of best running times of FreqTerm vs the running times of competing tools. Each point in a plot represents a pair of the FreqTerm run (x-axis) and the competing tool run (y-axis). Intuitively, green points represent cases when FreqTerm outperforms the competitor. On average, for programs solved by both FreqTerm and Ultimate Automizer, FreqTerm is 29 times faster (speedup calculated as a ratio of geometric means of the corresponding runs). In a similar setting, FreqTerm is 32 times faster than AProVE. However, FreqTerm is 2 times slower than HipTNT+. The evaluation further revealed (in Sect. 6.3) that the latter tool is efficient only on small programs (around 10 lines of code each), and for large-scale benchmarks it exceeds the timeout.

#### **6.2 Performance on Non-terminating Benchmarks**

We considered **176** terminating programs<sup>6</sup> from the Termination category of SVCOMP and programs crafted by ourselves. Altogether, four tools proved non-termination of 172 of them: AProVE 35, HipTNT+ 92, Ultimate Automizer 123, and Algorithm 3 implemented in FreqTerm **152**. Additionally, we evaluated the effect of ∀∃-solving in FreqTerm. For that reason, we implemented a version of Algorithm 3 in which non-termination is reduced to safety, but the conceptual SyGuS-based refinement generator remained the same. This implementation used Spacer3 for proving that the candidate refinement *can never* exit the loop. Among 176 benchmarks, such routine solved only 105, which is 30% fewer than Algorithm 3. However, it managed to verify 8 benchmarks that Algorithm 3 could not verify (we believe, because Spacer3 was able to add an auxiliary inductive invariant).

Logarithmic scatterplot in Fig. 4(b) shows comparisons of FreqTerm vs the running times of competing tools. On average, FreqTerm is 41 times faster than Ultimate Automizer, 73 times faster than AProVE, and exhibits roughly similar runtimes to HipTNT+ (again, here we considered only programs solved by both tools). Based on these experiments, we conclude that currently Freq-Term is more effective and more efficient at synthesizing non-terminating program refinements than at synthesizing terminating arguments.

#### **6.3 Large-Scale Benchmarks**

We considered some large-scale benchmarks for evaluation arising from Event-Condition-Action (ECA) systems that describe reactive behavior [1]. We considered various modifications of five challenging ECAs<sup>7</sup>. Each ECA consists of one large loop, where each iteration reads an input and modifies its internal state. If an unexpected input is read, the ECA terminates.

In our first case study, we aimed to prove non-termination of the given ECAs, i.e., that for any reachable internal state there exists an input value that would keep the ECA alive. The main challenge appeared to be in the size of benchmarks (up to 10000 lines of C code per loop) and reliance on an auxiliary inductive invariant. With the extra support of Spacer3 to provide the invariant, FreqTerm was able to prove non-termination of a wide range of programs. Among all the competing tools, only Ultimate Automizer was able to handle these benchmarks, but it verified only a small fraction of them within a 2 h timeout. In contrast, FreqTerm solved 301 out of 302 tasks and outperformed Ultimate Automizer by up to several orders of magnitude (i.e., from seconds to hours). Table 1 contains a brief summary of our experimental evaluation.<sup>8</sup>

In our second case study, we instrumented the ECAs by adding extra conditions to the loop guards, thus imposing an implicit upper bound on the number

<sup>6</sup> These benchmarks are available at https://goo.gl/bZbuA2.

<sup>7</sup> These benchmarks are available at https://goo.gl/7mc2Ww.

<sup>8</sup> To calculate average timings, we excluded cases when the tool exceeded timeout.


**Table 1.** FreqTerm vs Ultimate Automizer on non-terminating ECAs (302).

**Table 2.** FreqTerm vs Ultimate Automizer on terminating ECAs (207).


of loop iterations, and applied tools to prove termination<sup>9</sup> (shown in Table 2). Again, only Ultimate Automizer was able to compete with FreqTerm, and interestingly it was more successful here than in the first case study. Encouragingly, FreqTerm solved all but one instance and was consistently faster.

#### **7 Related Work**

*Proving Termination.* A wide range of state-of-the-art methods are based on iterative reasoning driven by counterexamples [4,5,9,10,19,21,23,27,29,36] whose goal is to show that transitions cannot be executed forever. These approaches typically combine termination arguments, proven independently, but none of them leverages the syntax of programs during the analysis.

A minor range of tools of termination analyzers are based on various types of learning. In particular, [39] discovers a terminating argument from attempts to prove that no program state is terminating; [34] exploits information derived from tests, [37] guesses and checks transition invariants (over-approximations to the reachable transitive closure of the transition relation) from libraries of templates. The closest to our approach, [31] guesses and checks transition invariants using loop guards and branch conditions. In contrast, our algorithms guess lower bounds for auxiliary program counters and extensively use all available source code for guessing candidates.

<sup>9</sup> The task of adding interesting guards appeared to be non-trivial, so we were able to instrument only a part of all non-terminating benchmarks.

*Proving Non-termination.* Traditional algorithms, e.g. [3,6,8,20,22], are based on a search for lasso-shaped traces and a discovery of *recurrence sets*, i.e., states that are visited infinitely often. For instance, [32] searches for a geometric series in lasso-shaped traces. Our algorithm discovers *existential* recurrence sets and does not deal with traces at all: it handles their abstraction via a ∀∃-formula.

A reduction to safety attracts significant attention here as well. In particular, [40] relies only on invariant generation to show that the loop guard is also satisfied, [19] infers weakest preconditions over inputs, under which program is non-terminating; and [7,28] iteratively eliminate terminating traces through a loop by adding extra assumptions. In contrast, our approach does not reduce to safety, and thus does not necessarily require invariants. However, we observed that if provided, in practice they often accelerate our verification process.

*Syntax-Guided Synthesis.* SyGuS [2] is applied to various tasks related to program synthesis, e.g., [13,17,25,33,35,41]. However, the formal grammar in those applications is typically given or constructed from user-provided examples. To the best of our knowledge, the only application of SyGuS to automatic program analysis was proposed by [14,16], and it inspired our approach. Originally, the formal grammar, constructed from the verification condition, was iteratively used to guess and check only inductive invariants. In this paper, we showed that a similar reasoning is practical and easily transferable across applications.

#### **8 Conclusion**

We have presented new algorithms for synthesis of termination arguments and non-terminating program refinements. Driven by SyGuS, they iteratively generate candidate formulas which tend to follow syntactic patterns obtained from the source code. By construction, the number of possible candidates is always finite, thus the search space is always relatively small. The algorithms rely on recent advances in constraint solving, they do not depend on a particular backend engine, and thus performance of checking validity of a candidate can be improved by advancements in solvers. Our implementation FreqTerm is evaluated on a wide range of terminating and non-terminating benchmarks. It is competitive with state-of-the-art and it significantly outperforms other tools when proving non-termination of large-scale Event-Condition-Action systems.

In future work, it would be interesting to investigate synergetic ways of integrating the proposed algorithms together, as well as exploiting strengths of different backend Horn solvers for different verification tasks.

#### **References**


2011. LNCS, vol. 6605, pp. 81–95. Springer, Heidelberg (2011). https://doi.org/10. 1007/978-3-642-19835-9 9


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

### **Model Checking Quantitative Hyperproperties**

Bernd Finkbeiner, Christopher Hahn, and Hazem Torfah(B)

Reactive Systems Group, Saarland University, Saarbr¨ucken, Germany {finkbeiner,hahn,torfah}@react.uni-saarland.de

**Abstract.** Hyperproperties are properties of sets of computation traces. In this paper, we study quantitative hyperproperties, which we define as hyperproperties that express a bound on the number of traces that may appear in a certain relation. For example, quantitative non-interference limits the amount of information about certain secret inputs that is leaked through the observable outputs of a system. Quantitative noninterference thus bounds the number of traces that have the same observable input but different observable output. We study quantitative hyperproperties in the setting of HyperLTL, a temporal logic for hyperproperties. We show that, while quantitative hyperproperties can be expressed in HyperLTL, the running time of the HyperLTL model checking algorithm is, depending on the type of property, exponential or even doubly exponential in the quantitative bound. We improve this complexity with a new model checking algorithm based on model-counting. The new algorithm needs only logarithmic space in the bound and therefore improves, depending on the property, exponentially or even doubly exponentially over the model checking algorithm of HyperLTL. In the worst case, the new algorithm needs polynomial space in the size of the system. Our Max#Sat-based prototype implementation demonstrates, however, that the counting approach is viable on systems with nontrivial quantitative information flow requirements such as a passcode checker.

#### **1 Introduction**

Model checking algorithms [17] are the cornerstone of computer-aided verification. As their input consists of both the system under verification and a logical formula that describes the property to be verified, they uniformly solve a wide range of verification problems, such as all verification problems expressible in linear-time temporal logic (LTL), computation-tree logic (CTL), or the modal μ-calculus. Recently, there has been a lot of interest in extending model checking from standard trace and tree properties to *information flow* policies like observational determinism or quantitative information flow. Such policies are called

This work was partly supported by the ERC Grant 683300 (OSARES) and by the German Research Foundation (DFG) in the Collaborative Research Center 1223.

c The Author(s) 2018

H. Chockler and G. Weissenbacher (Eds.): CAV 2018, LNCS 10981, pp. 144–163, 2018. https://doi.org/10.1007/978-3-319-96145-3\_8

*hyperproperties* [21] and can be expressed in HyperLTL [18], an extension of LTL with trace quantifiers and trace variables. For example, *observational determinism* [47], the requirement that any pair of traces that have the same observable input also have the same observable output, can be expressed as the following HyperLTL formula: ∀π.∀π .( π =<sup>I</sup> π ) → ( π =<sup>O</sup> π ) For many information flow policies of interest, including observational determinism, there is no longer a need for property-specific algorithms: it has been shown that the standard HyperLTL model checking algorithm [26] performs just as well as a specialized algorithm for the respective property.

The class of hyperproperties studied in this paper is one where, by contrast, the standard model checking algorithm performes badly. We are interested in *quantitative hyperproperties*, i.e., hyperproperties that express a bound on the number of traces that may appear in a certain relation. A prominent example of this class of properties is *quantitative non-interference* [43,45], where we allow some flow of information but, at the same time, limit the amount of information that may be leaked. Such properties are used, for example, to describe the correct behavior of a password check, where some information flow is unavoidable ("the password was incorrect"), and perhaps some extra information flow is acceptable ("the password must contain a special character"), but the information should not suffice to guess the actual password. In HyperLTL, quantitative non-interference can be expressed [18] as the formula ∀π0. ∀π<sup>1</sup> ... ∀π2<sup>c</sup> .( - <sup>i</sup> (π<sup>i</sup> =<sup>I</sup> π0)) → <sup>i</sup>=<sup>j</sup> (π<sup>i</sup> <sup>=</sup><sup>O</sup> <sup>π</sup><sup>j</sup> ) . The formula states that there do not exist 2<sup>c</sup> + 1 traces (corresponding to more than c bits of information) with the same observable input but different observable output. The bad performance of the standard model checking algorithm is a consequence of the fact that the 2<sup>c</sup> + 1 traces are tracked simultaneously. For this purpose, the model checking algorithm builds and analyzes a (2<sup>c</sup> + 1)-fold self-composition of the system.

We present a new model checking algorithm for quantitative hyperproperties that avoids the construction of the huge self-composition. The key idea of our approach is to use *counting* rather than *checking* as the basic operation. Instead of building the self-composition and then *checking* the satisfaction of the formula, we add new atomic propositions and then *count* the number of sequences of evaluations of the new atomic propositions that satisfy the specification. Quantitative hyperproperties are expressions of the following form:

$$
\forall \pi\_1 \dots \, \forall \pi\_k . \varphi \to (\#\sigma : X . \psi \lhd n),
$$

where ∈ {≤, <, ≥, >, =}. The universal quantifiers introduce a set of reference traces against which other traces can be compared. The formulas ϕ and ψ are HyperLTL formulas. The counting quantifier #σ : X. ψ counts the number of paths σ with different valuations of the atomic propositions X that satisfy ψ. The requirement that no more than c bits of information are leaked is the following quantitative hyperproperty:

$$\forall \pi. \#\sigma \colon O. \square(\pi =\_I \sigma) \le 2^{c^\*}$$

As we show in the paper, such expressions do not change the expressiveness of the logic; however, they allow us to express quantitative hyperproperties in exponentially more concise form. The counting-based model checking algorithm then maintains this advantage with a logarithmic counter, resulting in exponentially better performance in both time and space.

The viability of our counting-based model checking algorithm is demonstrated on a SAT-based prototype implementation. For quantitative hyperproperties of intrest, such as bounded leakage of a password checker, our algorithm shows promising results, as it significantly outperforms existing model checking approaches.

#### **1.1 Related Work**

Quantitative information-flow has been studied extensively in the literature. See, for example, the following selection of contributions on this topic: [1,14,19,32, 34,43]. Multiple verification methods for quantitative information-flow were proposed for sequential systems. For example, with static analysis techniques [15], approximation methods [35], equivalence relations [3,22], and randomized methods [35]. Quantitative information-flow for multi-threaded programs was considered in [11].

The study of quantitative information-flow in a reactive setting gained a lot of attention recently after the introduction of hyperproperties [21] and the idea of verifying the self-composition of a reactive system [6] in order to relate traces to each other. There are several possibilities to measure the amount of leakage, such as Shannon entropy [15,24,37], guessing entropy [3,34], and minentropy [43]. A classification of quantitative information-flow policies as safety and liveness hyperproperties was given in [46]. While several verification techniques for hyperproperties exists [5,31,38,42], the literature was missing general approaches to quantitative information-flow control. SecLTL [25] was introduced as first general approach to model check (quantitative) hyperproperties, before HyperLTL [18], and its corresponding model checker [26], was introduced as a temporal logic for hyperproperties, which subsumes the previous approaches.

Using counting to compute the number of solutions of a given formula is studied in the literature as well and includes many probabilistic inference problems, such as Bayesian net reasoning [36], and planning problems, such as computing robustness of plans in incomplete domains [40]. State-of-the-art tools for propositional model counting are Relsat [33] and c2d [23]. Algorithms for counting models of temporal logics and automata over infinite words have been introduced in [27,28,44]. The counting of projected models, i.e., when some parts of the models are irrelevant, was studied in [2], for which tools such as #CLASP [2] and DSharp P [2,41] exist. Our SAT-based prototype implementation is based on a reduction to a Max#SAT [29] instance, for which a corresponding tool exists.

Among the already existing tools for computing the amount of information leakage, for example, QUAIL [8], which analyzes programs written in a specific while-language and LeakWatch [12], which estimates the amount of leakage in Java programs, Moped-QLeak [9] is closest to our approach. However, their approach of computing a symbolic summary as an Algebraic Decision Diagram is, in contrast to our approach, solely based on model counting, not maximum model counting.

#### **2 Preliminaries**

#### **2.1 HyperLTL**

HyperLTL [18] extends linear-time temporal logic (LTL) with trace variables and trace quantifiers. Let *AP* be a set of *atomic propositions*. A *trace* t is an infinite sequence over subsets of the atomic propositions. We define the set of traces *TR* := (2*AP* )<sup>ω</sup>. A subset <sup>T</sup> <sup>⊆</sup> *TR* is called a *trace property* and a subset <sup>H</sup> <sup>⊆</sup> <sup>2</sup>*TR* is called a *hyperproperty*. We use the following notation to manipulate traces: let <sup>t</sup> <sup>∈</sup> *TR* be a trace and <sup>i</sup> <sup>∈</sup> <sup>N</sup> be a natural number. <sup>t</sup>[i] denotes the i-th element of t. Therefore, t[0] represents the starting element of the trace. Let <sup>j</sup> <sup>∈</sup> <sup>N</sup> and <sup>j</sup> <sup>≥</sup> <sup>i</sup>. <sup>t</sup>[i, j] denotes the sequence <sup>t</sup>[i] <sup>t</sup>[<sup>i</sup> + 1] ...t[<sup>j</sup> <sup>−</sup> 1] <sup>t</sup>[j]. <sup>t</sup>[i,∞] denotes the infinite suffix of t starting at position i.

*HyperLTL Syntax.* Let V be an infinite supply of trace variables. The syntax of HyperLTL is given by the following grammar:

$$\begin{array}{rclrcl}\psi & ::= & \exists \pi. \psi \; | \; \forall \pi. \psi \; | \; \varphi\\\varphi & ::= & a\_{\pi} \; | \; \neg \varphi \; | \; \varphi \lor \varphi \; | \; \mathsf{O}\varphi \; | \; \varphi \; \mathsf{U}\varphi\end{array}$$

where a ∈ *AP* is an atomic proposition and π ∈ V is a trace variable. Note that atomic propositions are indexed by trace variables. The quantification over traces makes it possible to express properties like "on all traces ψ must hold", which is expressed by ∀π. ψ. Dually, one can express that "there exists a trace such that ψ holds", which is denoted by ∃π. ψ. The derived operators , , and W are defined as for LTL. We abbreviate the formula - <sup>x</sup>∈<sup>X</sup>(x<sup>π</sup> <sup>↔</sup> <sup>x</sup><sup>π</sup>- ), expressing that the traces π and π are equal with respect to a set X ⊆ *AP* of atomic propositions, by π =<sup>X</sup> π . Furthermore, we call a trace variable π free in a HyperLTL formula if there is no quantification over π and we call a HyperLTL formula ϕ closed if there exists no free trace variable in ϕ.

*HyperLTL Semantics.* A HyperLTL formula defines a *hyperproperty*, i.e., a set of sets of traces. A set T of traces satisfies the hyperproperty if it is an element of this set of sets. Formally, the semantics of HyperLTL formulas is given with respect to a *trace assignment* Π from V to *TR*, i.e., a partial function mapping trace variables to actual traces. Π[π → t] denotes that π is mapped to t, with everything else mapped according to Π. Π[i,∞] denotes the trace assignment that is equal to Π(π)[i,∞] for all π.


We say a set of traces T *satisfies* a HyperLTL formula ϕ if Π |=<sup>T</sup> ϕ, where Π is the empty trace assignment.

#### **2.2 System Model**

A *Kripke structure* is a tuple K = (S, s0, δ, *AP*, L) consisting of a set of states <sup>S</sup>, an initial state <sup>s</sup><sup>0</sup> <sup>∈</sup> <sup>S</sup>, a transition function <sup>δ</sup> : <sup>S</sup> <sup>→</sup> <sup>2</sup><sup>S</sup>, a set of *atomic propositions AP*, and a *labeling function* <sup>L</sup> : <sup>S</sup> <sup>→</sup> <sup>2</sup>*AP* , which labels every state with a set of atomic propositions. We assume that each state has a successor, i.e., δ(s) = ∅. This ensures that every run on a Kripke structure can always be extended to an infinite run. We define a *path* of a Kripke structure as an infinite sequence of states <sup>s</sup>0s<sup>1</sup> ···∈ <sup>S</sup><sup>ω</sup> such that <sup>s</sup><sup>0</sup> is the initial state of <sup>K</sup> and <sup>s</sup>i+1 <sup>∈</sup> <sup>δ</sup>(si) for every <sup>i</sup> <sup>∈</sup> <sup>N</sup>. We denote the set of all paths of <sup>K</sup> that start in a state <sup>s</sup> with *Paths*(K, s). Furthermore, *Paths*∗(K, s) denotes the set of all path prefixes and *Paths*<sup>ω</sup>(K, s) the set of all path suffixes. A *trace* of a Kripke structure is an infinite sequence of sets of atomic propositions <sup>L</sup>(s0), L(s1), ··· ∈ (2*AP* )<sup>ω</sup>, such that <sup>s</sup><sup>0</sup> is the initial state of <sup>K</sup> and <sup>s</sup>i+1 <sup>∈</sup> <sup>δ</sup>(si) for every <sup>i</sup> <sup>∈</sup> <sup>N</sup>. We denote the set of all traces of K that start in a state s with *TR*(K, s). We say that a Kripke structure K *satisfies* a HyperLTL formula ϕ if its set of traces satisfies ϕ, i.e., if Π |=*TR*(K,s0) ϕ, where Π is the empty trace assignment.

#### **2.3 Automata over Infinite Words**

In our construction we use automata over infinite words. A *B¨uchi automaton* is a tuple B = (Q, Q0, δ, Σ, F), where Q is a set of states, Q<sup>0</sup> is a set of initial states, <sup>δ</sup> : <sup>Q</sup> <sup>×</sup> <sup>Σ</sup> <sup>→</sup> <sup>2</sup><sup>Q</sup> is a transition relation, and <sup>F</sup> <sup>⊂</sup> <sup>Q</sup> are the accepting states. A run of <sup>B</sup> on an infinite word <sup>w</sup> <sup>=</sup> <sup>α</sup>1α<sup>2</sup> ···∈ <sup>Σ</sup><sup>ω</sup> is an infinite sequence <sup>r</sup> <sup>=</sup> <sup>q</sup>0q<sup>1</sup> ···∈ <sup>Q</sup><sup>ω</sup> of states, where <sup>q</sup><sup>0</sup> <sup>∈</sup> <sup>Q</sup><sup>0</sup> and for each <sup>i</sup> <sup>≥</sup> 0, <sup>q</sup>i+1 <sup>=</sup> <sup>δ</sup>(qi, αi+1). We define **Inf**(r) = {q ∈ Q | ∀i∃j > i. r<sup>j</sup> = q}. A run r is called accepting if **Inf**(r) ∩ F = ∅. A word w is accepted by B and called a *model* of B if there is an accepting run of B on w.

Furthermore, an *alternating automaton*, whose runs generalize from sequences to trees, is a tuple A = (Q, Q0, δ, Σ, F). Q, Q0, Σ, and F are defined as above and <sup>δ</sup> : <sup>Q</sup> <sup>×</sup> <sup>Σ</sup> <sup>→</sup> <sup>B</sup><sup>+</sup><sup>Q</sup> being a transition function, which maps a state and a symbol into a Boolean combination of states. Thus, a run(-tree) of an alternating B¨uchi automaton A on an infinite word w is a Q-labeled tree. A word w is accepted by A and called a *model* if there exists a run-tree T such that all paths p trough T are accepting, i.e., **Inf**(p) ∩ F = ∅.

A strongly connected component (SCC) in A is a maximal strongly connected component of the graph induced by the automaton. An SCC is called *accepting* if one of its states is an accepting state in A.

#### **3 Quantitative Hyperproperties**

Quantitative Hyperproperties are properties of sets of computation traces that express a bound on the number of traces that may appear in a certain relation. In the following, we study quantitative hyperproperties that are specified in terms of HyperLTL formulas. We consider expressions of the following general form:

$$
\forall \pi\_1, \dots, \pi\_k. \; \varphi \to \left(\#\sigma : A. \; \psi \lhd n\right),
$$

Both the universally quantified variables π1,...,π<sup>k</sup> and the variable σ after the *counting* operator # are trace variables; ϕ is a HyperLTL formula over atomic propositions AP and free trace variables π<sup>1</sup> ...πk; A ⊆ AP is a set of atomic propositions; ψ is a HyperLTL formula over atomic propositions AP and free trace variables π<sup>1</sup> ...π<sup>k</sup> and, additionally σ. The operator ∈ {<, ≤, =, >, ≥} is a comparison operator; and <sup>n</sup> <sup>∈</sup> <sup>N</sup> is a natural number.

For a given set of traces T and a valuation of the trace variables π1,...,πk, the term #σ : A. ψ computes the number of traces σ in T that differ in their valuation of the atomic propositions in A and satisfy ψ. The expression #σ : A. ψn is *true* iff the resulting number satisfies the comparison with n. Finally, the complete expression ∀π1,...,πk. ϕ → (#σ : A. ψ n) is *true* iff for all combinations π1,...,π<sup>k</sup> of traces in T that satisfy ϕ, the comparison #σ : A. ψ n is satisfied.

*Example 1 (Quantitative non-interference).* Quantitative information-flow policies [13,20,30,34] allow the flow of a bounded amount of information. One way to measure leakage is with *min-entropy* [43], which quantifies the amount of information an attacker can gain given the answer to a single guess about the secret. The *bounding problem* [45] for min-entropy is to determine whether that amount is bounded from above by a constant 2<sup>c</sup>, corresponding to c bits. We assume that the program whose leakage is being quantified is deterministic, and assume that the secret input to that program is uniformly distributed. The bounding problem then reduces to determining that there is no tuple of 2<sup>c</sup> + 1 distinguishable traces [43,45]. Let O ⊆ AP be the set of observable outputs. A simple quantitative information flow policy is then the following quantitative hyperproperty, which bounds the number of distinguishable outputs to 2<sup>c</sup>, corresponding to a bound of c bits of information:

$$\#\sigma: O.\,\,true\,\le 2^{c}$$

A slightly more complicated information flow policy is quantitative noninterference. In quantitative non-interference, the bound must be satisfied for every individual input. Let I ⊆ AP be the observable inputs to the system. Quantitative non-interference is the following quantitative hyperproperty<sup>1</sup>:

$$\forall \pi. \#\sigma \colon O. \ (\Box(\pi =\_I \sigma)) \leq 2^c$$

For each trace π in the system, the property checks whether there are more than 2<sup>c</sup> traces σ that have the same observable input as π but different observable output.

*Example 2 (Deniability).* A program satisfies *deniability* (see, for example, [7, 10]) when there is no proof that a certain input occurred from simply observing the output, i.e., given an output of a program one cannot derive the input that lead to this output. A deterministic program satisfies deniability when each output can be mapped to at least two inputs. A quantitative variant of deniability is when we require that the number of corresponding inputs is larger than a given threshold. Quantitative deniability can be specified as the following quantitative Hyperproperty:

$$\forall \pi. \#\sigma \colon I. \left(\square(\pi =\_{O} \sigma)\right) > n$$

For all traces π of the system we count the number of sequences σ in the system with different input sequences and the same output sequence of π, i.e., for the fixed output sequence given by π we count the number of input sequences that lead to this output.

#### **4 Model Checking Quantitative Hyperproperties**

We present a model checking algorithm for quantitative hyperproperties based on model counting. The advantage of the algorithm is that its runtime complexity is independent of the bound n and thus avoids the n-fold self-composition necessary for any encoding of the quantitative hyperproperty in HyperLTL.

Before introducing our novel counting-based algorithm, we start by a translation of quantitative hyperproperties into formulas in HyperLTL and establishing an exponential lower bound for its representation.

#### **4.1 Standard Model Checking Algorithm: Encoding Quantitative Hyperproperties in HyperLTL**

The idea of the reduction is to check a lower bound of n traces by existentially quantifying over n traces, and to check an upper bound of n traces by *universally* quantifying over n + 1 traces. The resulting HyperLTL formula can be verified using the standard model checking algorithm for HyperLTL [18].

<sup>1</sup> We write π =<sup>A</sup> π short for π<sup>A</sup> = π- <sup>A</sup> where π<sup>A</sup> is the A-projection of π.

**Theorem 1.** *Every quantitative hyperproperty* ∀π1,...,πk. ψ<sup>ι</sup> → (#σ : A. ψn) *can be expressed as a HyperLTL formula. For* ∈ {≤}({<})*, the HyperLTL formula has* n + k + 1(*resp.* n + k) *universal trace quantifiers in addition to the quantifiers in* ψ<sup>ι</sup> *and* ψ*. For* ∈ {≥}({>})*, the HyperLTL formula has* k *universal trace quantifiers and* n (*resp.* n + 1) *existential trace quantifiers in addition to the quantifiers in* ψ<sup>ι</sup> *and* ψ*. For* ∈ {=}*, the HyperLTL formula has* k + n + 1 *universal trace quantifiers and* n *existential trace quantifiers in addition to the quantifiers in* ψ<sup>ι</sup> *and* ψ*.*

*Proof.* For ∈ {≤}, we encode the quantitative hyperproperty ∀π1,...,πk. ψ<sup>ι</sup> → (#σ : A. ψ n) as the following HyperLTL formula:

$$(\forall \pi\_1, \dots, \pi\_k. \,\forall \pi'\_1, \dots, \pi'\_{n+1}. \,\left(\psi\_i \land \bigwedge\_{i \neq j} \bigotimes\_{i} (\pi'\_i \neq\_A \pi'\_j)\right) \to \left(\bigvee\_i \neg \psi[\sigma \mapsto \pi'\_i]\right))$$

where ψ[σ → π <sup>i</sup>] is the HyperLTL formula ψ with all occurrences of σ replaced by π <sup>i</sup>. The formula states that there is no tuple of n + 1 traces π 1,...,π n+1 different in the evaluation of A, that satisfy ψ. In other words, for every n + 1 tuple of traces π 1,...,π <sup>n</sup>+1 that differ in the evaluation of A, one of the paths must violate ψ. For ∈ {<}, we use the same formula, with ∀π 1,...,π <sup>n</sup> instead of ∀π 1,...,π <sup>n</sup>+1.

For ∈ {≥}, we encode the quantitative hyperproperty analogously as the HyperLTL formula

$$\forall \pi\_1, \dots, \pi\_k. \exists \pi'\_1, \dots, \pi'\_n. \ \psi\_\iota \rightarrow \left(\bigwedge\_{i \neq j} \bigotimes (\pi'\_i \neq\_A \pi'\_j)\right) \wedge \left(\bigwedge\_i \psi[\sigma \mapsto \pi'\_i]\right)$$

The formula states that there exist paths π 1,...,π <sup>n</sup> that differ in the evaluation of A and that all satisfy ψ. For ∈ {>}, we use the same formula, with ∃π 1,...,π <sup>n</sup>+1 instead of ∀π 1,...,π <sup>n</sup>. Lastly, for ∈ {=}, we encode the quantitative hyperproperty as a conjunction of the encodings for ≤ and for ≥.

*Example 3 (Quantitative non-interference in HyperLTL).* As discussed in Example 1, quantitative non-interference is the quantitative hyperproperty

$$
\forall \pi. \#\sigma \colon O. \square(\pi =\_I \sigma) \le 2^c,
$$

where we measure the amount of leakage with min-entropy [43]. The bounding problem for min-entropy asks whether the amount of information leaked by a system is bounded by a constant 2<sup>c</sup> where c is the number of bits. This is encoded in HyperLTL as the requirement that there are no 2<sup>c</sup> + 1 traces distinguishable in their output:

$$\forall \pi\_0. \; \forall \pi\_1 \; \dots \; \forall \pi\_{2^c} \; \left( \bigwedge\_i \square(\pi\_i =\_I \pi\_0) \right) \to \left( \bigvee\_{i \neq j} \square(\pi\_i =\_O \pi\_j) \right) \dots$$

This formula is equivalent to the formalization of quantitative non-interference given in [26].

Model checking quantitative hyperproperties via the reduction to HyperLTL is very expensive. In the best case, when ∈ {≤, <}, ψ<sup>ι</sup> does not contain existential quantifiers, and ψ does not contain universal quantifiers, we obtain an HyperLTL formula without quantifier alternations, where the number of quantifiers grows linearly with the bound n. For m quantifiers, the HyperLTL model checking algorithm [26] constructs and analyzes the m-fold self-composition of the Kripke structure. The running time of the model checking algorithm is thus exponential in the bound. If ∈ {≥, >, =}, the encoding additionally introduces a quantifier alternation. The model checking algorithm checks quantifier alternations via a complementation of B¨uchi automata, which adds another exponent, resulting in an overall doubly exponential running time.

The model checking algorithm we introduce in the next section avoids the n-fold self-composition needed in the model checking algorithm of HyperLTL and its complexity is independent of the bound n.

#### **4.2 Counting-Based Model Checking Algorithm**

A Kripke structure K = (S, s0,τ, *AP*, L) violates a quantitative hyperproperty

$$
\varphi = \forall \pi\_1, \dots, \pi\_k. \ \psi\_\iota \to \left(\#\sigma \,:\, A. \psi \lhd n\right).
$$

if there is a k-tuple t = (π1,...,πk) of traces π<sup>i</sup> ∈ *TR*(K) that satisfies the formula

$$
\exists \pi\_1, \dots, \pi\_k. \; \psi\_\iota \land \left(\#\sigma \; :A. \; \psi \,\exists n\right),
$$

where is the negation of the comparison operator . The tuple t then satisfies the property ψ<sup>ι</sup> and the number of (k + 1)-tuples t = (π1,...,πk, σ) for σ ∈ *TR*(K) that satisfy ψ and differ pairwise in the A-projection of σ satisfies the comparison n (The A-projection of a sequence σ is defined as the sequence <sup>σ</sup><sup>A</sup> <sup>∈</sup> (2<sup>A</sup>)<sup>ω</sup>, such that for every position <sup>i</sup> and every <sup>a</sup> <sup>∈</sup> <sup>A</sup> it holds that <sup>a</sup> <sup>∈</sup> <sup>σ</sup>A[i] if and only if a ∈ σ[i]). The tuples t can be captured by the automaton composed of the product of an automaton Aψι∧<sup>ψ</sup> that accepts all k+1 of traces that satisfy both ψ<sup>ι</sup> and ψ and a k + 1-self composition of K. Each accepting run of the product automaton presents k + 1 traces of K that satisfy ψ<sup>ι</sup> ∧ ψ. On top of the product automaton, we apply a special counting algorithm which we explain in detail in Sect. 4.4 and check if the result satisfies the comparison n.

Algorithm 1 gives a general picture of our model checking algorithm. The algorithm has two parts. The first part applies if the relation is one of {≥, >}. In this case, the algorithm checks whether a sequence over *AP*<sup>ψ</sup> (propositions in ψ) corresponds to infinitely many sequences over A. This is done by checking whether the product automaton B has a so-called *doubly pumped lasso*(DPL), a subgraph with two connected lassos, with a unique sequence over *AP*<sup>ψ</sup> and different sequences over A. Such a doubly pumped lasso matches the same sequence over *AP*<sup>ψ</sup> with infinitely many sequences over A (more in Sect. 4.4). If no doubly pumped lasso is found, a projected model counting algorithm is applied in the second part of the algorithm in order to compute either the maximum or the minimum value, corresponding to the comparison operator . In the next subsections, we explain the individual parts of the algorithm in detail.

```
Algorithm 1. Counting-based Model Checking of Quantitative Hyperproperties
```

```
Input: Quantitative Hyperproperty ϕ = ∀π1 ...πk. ψι → (#σ : A.ψ  n), Kripke
   Structure K = (S, s0, τ, AP, L)
Output: K |= ϕ
1: B = QHLTL2BA(K, π1,...,πk, ψι ∧ ψ)
2: /*Check Infinity*/
3: if  ∈ {≥, >} then
4: ce = DPL(B)
5: if ce = ⊥ then
6: return ce
7: /*Apply Projected Counting Algorithm*/
8: if  ∈ {≥, >} then
9: ce = MaxCount(B, n, )
10: else
11: ce = MinCount(B, n, )
12: return ce
```
#### **4.3 B¨uchi Automata for Quantitative Hyperproperties**

For a quantitative hyperproperty ϕ = ∀π<sup>1</sup> ...πk. ψ<sup>ι</sup> → (#σ : A.ψ n) and a Kripke structure K = (S, s0,τ, *AP*, L), we first construct an alternating automaton A<sup>ψ</sup>ι∧<sup>ψ</sup> for the HyperLTL property ψ<sup>ι</sup> ∧ ψ. Let A<sup>ψ</sup><sup>1</sup> = (Q1, q0,1, Σ2, δ1, F1) and A<sup>ψ</sup><sup>2</sup> = (Q2, q0,2, Σ2, δ2, F2) be alternating automata for subformulas ψ<sup>1</sup> and ψ2. Let Σ = 2*AP*<sup>ϕ</sup> where AP<sup>ϕ</sup> are all indexed atomic propositions that appear in <sup>ϕ</sup>. <sup>A</sup><sup>ψ</sup>ι∧<sup>ψ</sup> is constructed using following rules<sup>2</sup>:


For a quantified formula ϕ = ∃π.ψ1, we construct the product automaton of the Kripke structure K and the B¨uchi automaton of ψ1. Here we reduce the alphabet of the automaton by projecting all atomic proposition in *AP*<sup>π</sup> away:

<sup>2</sup> The construction follows the one presented in [26] with a slight modification on the labeling of transitions. Labeling over atomic proposition instead of the states of the Kripke structure suffices, as any nondeterminism in the Kripke structure is inherently resolved, because we quantify over trace not paths.

$$\overline{\varphi = \exists \pi, \psi\_1 \left| \begin{array}{c} A\_{\varphi} = (Q\_1 \times S \cup \{q\_0\}, \Sigma \mid AP\_{\pi}, \delta, F\_1 \times S) \\ \text{where } \delta(q\_0, \alpha) = \{(q', s') \mid q' \in \delta\_1(q\_{0, 1}, \alpha \cup \alpha'), s' \in \tau(s\_0), (L(s\_0))\_{\pi} =\_{AP\_{\pi}} \alpha' \} \\ \text{and } \delta((q, s), \alpha) = \{(q', s') \mid q' \in \delta\_1(q, \alpha \cup \alpha'), s' \in \tau(s), (L(s))\_{\pi} =\_{AP\_{\pi}} \alpha' \} \end{array} \right.$$

Given the B¨uchi automaton for the hyperproperty ψ<sup>ι</sup> ∧ψ it remains to construct the product with the k+ 1-self composition of K. The transitions of the automaton are defined over labels from Σ = 2*AP*<sup>∗</sup> where AP<sup>∗</sup> = *AP*<sup>σ</sup> ∪ <sup>i</sup> *AP*π<sup>i</sup> . Aψι∧ψ. This is necessary to identify which transition was taken in each copy of K, thus, mirroring a tuple of traces in K. For each of the variables π1,...π<sup>k</sup> and σ we use following rule:

$$\begin{array}{|c|c|c|}\hline \hline \varphi = \exists \pi. \psi\_1 & A\_{\varphi} = (Q\_1 \times S \cup \{q\_0\}, \Sigma, \delta, F\_1 \times S) \\ \text{where } \delta(q\_0, \alpha) = \{ (q', s') \mid q' \in \delta\_1(q\_{0,1}, \alpha), s' \in \tau(s\_0), (L(s\_0))\_{\pi} =\_{AP\_{\pi}} \} \\ \text{and } \delta((q, s), \alpha) = \{ (q', s') \mid q' \in \delta\_1(q, \alpha), s' \in \tau(s), (L(s))\_{\pi} =\_{AP\_{\pi}} \} \\ \hline \end{array}$$

Finally, we transform the resulting alternating automaton to an equivalent B¨uchi automaton following the construction of Miyano and Hayashi [39].

#### **4.4 Counting Models of** *ω***-Automata**

Computing the number of words accepted by a B¨uchi automaton can be done by examining its accepting lassos. Consider, for example, the B¨uchi automata over the alphabet 2{a} in Fig. 1. The automaton on the left has one accepting lasso (q0)<sup>ω</sup> and thus has only one model, namely {a}<sup>ω</sup>. The automaton on the right has infinitely many accepting lassos (q0{})<sup>i</sup> {a}(q1({} ∨ {a}))<sup>ω</sup> that accept infinitely many different words all of the from {}<sup>∗</sup>{a}({} ∨ {a})<sup>ω</sup>. Computing the models of a B¨uchi automaton is insufficient for model checking quantitative hyperproperties as we are not interested in the total number of models. We rather *maximize*, respectively *minimize*, over sequences of subsets of atomic propositions *the number of projected models* of the B¨uchi automaton. For instance, consider the automaton given in Fig. 2. The automaton has infinitely many models. However, the maximum number of sequences <sup>σ</sup><sup>b</sup> <sup>∈</sup> <sup>2</sup>{b} that correspond to accepting lassos in the automaton with a unique sequence <sup>σ</sup><sup>a</sup> <sup>∈</sup> <sup>2</sup>{a} is two: For example, let n be a natural number. For any model of the automaton and for each sequence <sup>σ</sup><sup>a</sup> := {}<sup>n</sup>{a}({})<sup>ω</sup> the automaton accepts the following two sequences: {b}<sup>n</sup>{}{b}<sup>ω</sup> and {b}<sup>ω</sup>. Formally, given a B¨uchi automaton <sup>B</sup> over *AP* and a set A, such that A ⊆ *AP*, an A*-projected model* (or projected model over <sup>A</sup>) is defined as a sequence <sup>σ</sup><sup>A</sup> <sup>∈</sup> (2*<sup>A</sup>*)<sup>ω</sup> that results in the <sup>A</sup>-projection of an accepting sequence <sup>σ</sup> <sup>∈</sup> (2*AP* )<sup>ω</sup>.

**Fig. 1.** B¨uchi automata with one model (left) and infinitely many models (right).

**Fig. 2.** A two-state B¨uchi automaton, such that there exist exactly two {b}-projected models for each {a}-projected sequence.

In the following, we define the maximum model counting problem over automata and give an algorithm for solving the problem. We show how to use the algorithm for model checking quantitative hyperproperties.

**Definition 1 (Maximum Model Counting over Automata (MMCA)).** *Given a B¨uchi automaton* B *over an alphabet* 2*AP for some set of atomic propositions AP and sets* X, Y, Z ⊆ *AP the maximum model counting problem is to compute*

$$\max\_{\sigma\_Y \in (2^Y)^\omega} |\{\sigma\_X \in (2^X)^\omega \mid \exists \sigma\_Z \in (2^Z)^\omega . \sigma\_X \cup \sigma\_Y \cup \sigma\_Z \in L(B)\}|$$

*where* σ ∪ σ *is the point-wise union of* σ *and* σ *.*

As a first step in our algorithm, we show how to check whether the maximum model count is equal to infinity.

**Definition 2 (Doubly Pumped Lasso).** *For a graph* G*, a doubly pumped lasso in* G *is a subgraph that entails a cycles* C<sup>1</sup> *and another different cycle* C<sup>2</sup> *that is reachable from* C1*.*

**Fig. 3.** Forms of doubly pumped lassos.

In general, we distinguish between two types of doubly pumped lassos as shown in Fig. 3. We call the lassos with periods C<sup>1</sup> and C<sup>2</sup> the lassos of the doubly pumped lasso. A doubly pumped lasso of a B¨uchi automaton B is one in the graph structure of B. The doubly pumped lasso is called accepting when C<sup>2</sup> has an accepting state. A more generalized formalization of this idea is given in the following theorem.

**Theorem 2.** *Let* B = (Q, q0, δ, 2*AP*, F) *be a B¨uchi automaton for some set of atomic propositions AP* <sup>=</sup> <sup>X</sup> <sup>∪</sup> <sup>Y</sup> <sup>∪</sup> <sup>Z</sup> *and let* <sup>σ</sup> <sup>∈</sup> (2<sup>Y</sup> )<sup>ω</sup>*. The automaton* <sup>B</sup> *has infinitely many* X ∪ Y *-projected models* σ *with* σ =<sup>Y</sup> σ *if and only if* B *has an accepting doubly pumped lasso with lassos* ρ *and* ρ *such that: (1)* ρ *is an accepting lasso (2) tr*(ρ) =<sup>Y</sup> *tr*(ρ ) =<sup>Y</sup> σ *(3) The period of* ρ *shares at least one state with* ρ *and (4) tr*(ρ) =<sup>X</sup> *tr*(ρ )*.*

To check whether there is a sequence <sup>σ</sup> <sup>∈</sup> (2<sup>Y</sup> )<sup>ω</sup> such that the number of <sup>X</sup> <sup>∪</sup><sup>Y</sup> projected models σ of B with σ =<sup>Y</sup> σ is infinite, we search for a doubly pumped lasso satisfying the constraints given in Theorem 2. This can be done by applying the following procedure:

Given a B¨uchi automaton <sup>B</sup> = (Q, q0, <sup>2</sup>*AP*, δ, F) and sets X, Y, Z <sup>⊆</sup> *AP*, we construct the following product automaton <sup>B</sup><sup>×</sup> = (Q×, q×,<sup>0</sup>, <sup>2</sup>*AP* <sup>×</sup> <sup>2</sup>*AP*, δ×, F×) where: <sup>Q</sup><sup>×</sup> <sup>=</sup> <sup>Q</sup> <sup>×</sup> <sup>Q</sup>, <sup>q</sup>×,<sup>0</sup> = (q0, q0), <sup>δ</sup><sup>×</sup> <sup>=</sup> {(s1, s2) (α,α- ) −−−−→ (s 1, s <sup>2</sup>) | s<sup>1</sup> α −→ s2, s 1 α- −→ s <sup>2</sup>, α =<sup>Y</sup> α } and F<sup>×</sup> = Q × F. The automaton B has infinitely many models σ if there is an accepting lasso ρ = (q0, q0)(α1, α <sup>1</sup>)...((q<sup>j</sup> , q <sup>j</sup> )(αj+1, α <sup>j</sup>+1) ...(qk, q k)(αk+1, α <sup>k</sup>+1)) in B<sup>×</sup> such that: ∃h ≤ j. q <sup>h</sup> = q<sup>j</sup> , i.e., B has lassos ρ<sup>1</sup> and ρ<sup>2</sup> that share a state in the period of ρ<sup>1</sup> and ∃h > j. α<sup>h</sup> =<sup>X</sup> α <sup>h</sup>, i.e., the lassos differ in the evaluation of X in a position after the shared state and thus allows infinitely many different sequence over X for the a sequence over Y . The lasso ρ simulates a doubly pumped lasso in B satisfying the constraints of Theorem 2.

**Theorem 3.** *Given an alternating B¨uchi automaton* A = (Q, q0, δ, 2*AP*, F) *for a set of atomic propositions AP* = X ∪ Y ∪ Z*, the problem of checking whether there is a sequence* <sup>σ</sup> <sup>∈</sup> (2<sup>Y</sup> )<sup>ω</sup> *such that* <sup>A</sup> *has infinitely many* <sup>X</sup> <sup>∪</sup> <sup>Y</sup> *-projected models* σ *with* σ =<sup>Y</sup> σ *is* Pspace*-complete.*

The lower and upper bound for the problem can be given by a reduction from and to the satisfiability problem of LTL [4]. Due to the finite structure of B¨uchi automata, if the number of models of the automaton exceed the exponential bound 2|Q<sup>|</sup> , where Q is the set of states, then the automaton has infinitely many models.

**Lemma 1.** *For any B¨uchi automaton* B*, the number of models of* B *is less or equal to* <sup>2</sup>|Q<sup>|</sup> *otherwise it is* <sup>∞</sup>*.*

*Proof.* Assume the number of models is larger than 2|Q<sup>|</sup> then there are more than 2|Q<sup>|</sup> accepting lassos in B. By the pigeonhole principle, two of them share the same 2|Q<sup>|</sup> -prefix. Thus, either they are equal or we found doubly pumped lasso in B.

**Corollary 1.** *Let a B¨uchi automaton* B *over a set of atomic propositions AP and sets* X, Y <sup>⊆</sup> *AP. For each sequence* <sup>σ</sup><sup>Y</sup> <sup>∈</sup> (2<sup>Y</sup> )<sup>ω</sup> *the number of* <sup>X</sup> <sup>∪</sup> <sup>Y</sup>  *projected models* <sup>σ</sup> *with* <sup>σ</sup> <sup>=</sup><sup>Y</sup> <sup>σ</sup><sup>Y</sup> *is less or equal than* <sup>2</sup>|Q<sup>|</sup> *otherwise it is* <sup>∞</sup>*.*

From Corollary 1, we know that if no sequence <sup>σ</sup><sup>Y</sup> <sup>∈</sup> (2<sup>Y</sup> )<sup>ω</sup> matches to infinitely many X ∪ Y -projected models then the number of such models is bound by 2|Q<sup>|</sup> . Each of these models has a run in B which ends in an accepting strongly connected component. Also from Corollary 1, we know that every model has a lasso run of length |Q|. For each finite sequence w<sup>Y</sup> of length |w<sup>Y</sup> | = |Q| that reaches an accepting strongly connected component, we count the number X∪Y projected words w of length |Q| with w =<sup>Y</sup> w<sup>Y</sup> and that end in an accepting


**Fig. 4.** Maximum Model Counting Algorithm (left) and a Sketch of a step in this algorithm (right): Current elements of our working set are q1, q<sup>2</sup> ∈ W and q<sup>3</sup> ∈ W- . If i = 0, i.e., we are in the first step of the algorithm, then q<sup>1</sup> and q<sup>2</sup> are states of accepting SCCs.

strongly connected component. This number is equal to the maximum model counting number.

Algorithm 2 describes the procedure. An algorithm for the minimum model counting problem is defined in similar way. The algorithm works in a backwards fashion starting with states of accepting strongly connected components. In each iteration i, the algorithm maps each state of the automaton with X∪Y -projected words of length i that reach an accepting strongly connected component. After |Q| iterations, the algorithm determines from the mapping of initial state q<sup>0</sup> a Y -projected word of length |Q| with the maximum number of matching X ∪ Y projected words (Fig. 4).

**Theorem 4.** *The decisional version of the maximum model counting problem over automata (MMCA), i.e. the question whether the maximum is greater than a given natural number* n*, is in NP*#<sup>P</sup> *.*

*Proof.* Let a B¨uchi automaton over an alphabet 2*AP* for a set of atomic propositions *AP* and sets *AP*X, *AP*<sup>Y</sup> , *AP*<sup>Z</sup> ⊆ *AP* and a natural number n be given. We construct a nondeterministic Turing Machine M with access to a #P-oracle as follows: <sup>M</sup> guesses a sequence <sup>σ</sup><sup>Y</sup> <sup>∈</sup> <sup>2</sup>*APY* . It then queries the oracle, to compute a number <sup>c</sup>, such that <sup>c</sup> <sup>=</sup> |{σ<sup>X</sup> <sup>∈</sup> (2*AP*<sup>X</sup> )<sup>ω</sup> | ∃σ<sup>Z</sup> <sup>∈</sup> (2*AP*<sup>Z</sup> )<sup>ω</sup>. σ<sup>X</sup> <sup>∪</sup> <sup>σ</sup><sup>Y</sup> <sup>∪</sup> <sup>σ</sup><sup>Z</sup> <sup>∈</sup> L(B)}|, which is a #P problem [27]. It remains to check whether n>c. If so, M accepts.

The following theorem summarizes the main findings of this section, which establish, depending on the property, an exponentially or even doubly exponentially better algorithm (in the quantitative bound) over the existing model checking algorithm for HyperLTL.

**Theorem 5.** *Given a Kripke structure* K *and a quantitative hyperproperty* ϕ *with bound* n*, the problem whether* K |= ϕ *can be decided in logarithmic space in the quantitative bound* n *and in polynomial space in the size of* K*.*

#### **5 A Max#Sat-Based Approach**

For existential HyperLTL formulas ψ<sup>ι</sup> and ψ, we give a more practical model checking approach by encoding the automaton-based construction presented in Sect. 4 into a propositional formula.

Given a Kripke structure K = (S, s0,τ, *AP*K, L) and a quantitative hyperproperty ϕ = ∀π1,...,πk. ψ<sup>ι</sup> → (#σ : A. ψ) n over a set of atomic propositions *AP*<sup>ϕ</sup> ⊆ *AP*<sup>K</sup> and bound μ, our algorithm constructs a propositional formula φ such that, every satisfying assignment of φ uniquely encodes a tuple of lassos (π1,...,πk, σ) of length μ in K, where (π1,...,πk) satisfies ψ<sup>ι</sup> and (π1,...,πk, σ) satisfies <sup>ψ</sup>. To compute the values max (π1,...,πk) |{σ<sup>A</sup> | (π1,...,πk, σ) |= ψ<sup>ι</sup> ∧ ψ}| (in case ∈ {≤, <}) or min (π1,...,πk) |{σ<sup>A</sup> | (π1,...,πk, σ) |= ψ<sup>ι</sup> ∧ ψ}| (in case ∈ {≥, >}), we pass φ to a maximum model counter, respectively, to a minimum model counter with the appropriate sets of counting and maximization, respectively, minimization propositions. From Lemma 1 we know that it is enough to consider lasso of length exponential in the size of ϕ. The size of φ is thus exponential in the size of ϕ and polynomial in the size of K.

The construction resembles the encoding of the bounded model checking approach for LTL [16]. Let ψ<sup>ι</sup> = ∃π <sup>1</sup> ...π k- . ψ <sup>ι</sup> and ψ = ∃π <sup>1</sup> ...π k-- . ψ and let *AP*<sup>ψ</sup><sup>ι</sup> and *AP*<sup>ψ</sup> be the sets of atomic propositions that appear in ψ<sup>ι</sup> and ψ respectively. The propositional formula φ is given as a conjunction of the following propositional formulas: φ = - <sup>i</sup>≤<sup>k</sup>-K<sup>μ</sup> <sup>π</sup><sup>i</sup> ∧ -K<sup>μ</sup> <sup>σ</sup> ∧ ψι<sup>0</sup> <sup>μ</sup> ∧ ψ<sup>0</sup> <sup>μ</sup> where:


<sup>3</sup> We omitted the rules for boolean operators for the lack of space.


in case of an existential quantifier over a trace variable π, we add a copy of the encoding of K with new variables distinguished by π:


We define sets <sup>X</sup> <sup>=</sup> {a<sup>i</sup> <sup>σ</sup> <sup>|</sup> <sup>a</sup> <sup>∈</sup> A, i <sup>≤</sup> <sup>k</sup>}, <sup>Y</sup> <sup>=</sup> {a<sup>i</sup> <sup>|</sup> <sup>a</sup> <sup>∈</sup> *AP*<sup>ψ</sup> \ A, i <sup>≤</sup> <sup>k</sup>} and Z = P \ X ∪Y , where P is the set of all propositions in φ. The maximum model counting problem is then *MMC*(φ, X, Y, Z).

#### **5.1 Experiments**

We have implemented the Max#Sat-based model checking approach from the last section. We compare the Max#Sat-based approach to the expansionbased approach using HyperLTL [26]. Our implementation uses the MaxCount tool [29]. We use the option in MaxCount that enumerates, rather than approximates, the number of assignments for the counting variables. We furthermore instrumented the tool so that it terminates as soon as a sample is found that exceeds the given bound. If no sample is found after one hour, we report a timeout.

Table 1 shows the results on a parameterized benchmark obtained from the implementation of an 8bit passcode checker. The parameter of the benchmark is

**Table 1.** Comparison between the expansion-based approach (MCHyper) and the Max#Sat-based approach (MCQHyper). #max is the number of maximization variables (set Y ). #count is the number of the counting variables (set X). TO indicates a time-out after 1 h.


the bound on the number of bits that is leaked to an adversary, who might, for example, enter passcodes in a brute-force manner. In all instances, a violation is found. The results show that the Max#Sat-based approach scales significantly better than the expansion-based approach.

#### **6 Conclusion**

We have studied quantitative hyperproperties of the form ∀π1,...,πk. ϕ → (#σ : A. ψ n), where ϕ and ψ are HyperLTL formulas, and #σ : A.ϕ n compares the number of traces that differ in the atomic propositions A and satisfy ψ to a threshold n. Many quantitative information flow policies of practical interest, such as quantitative non-interference and deniability, belong to this class of properties. Our new counting-based model checking algorithm for quantitative hyperproperties performs at least exponentially better in both time and space in the bound n than a reduction to standard HyperLTL model checking. The new counting operator makes the specifications exponentially more concise in the bound, and our model checking algorithm solves the concise specifications efficiently.

We also showed that the model checking problem for quantitative hyperproperties can be solved with a practical Max#SAT-based algorithm. The SAT-based approach outperforms the expansion-based approach significantly for this class of properties. An additional advantage of the new approach is that it can handle properties like deniability, which cannot be checked by MCHyper because of the quantifier alternation.

### **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

### **Exploiting Synchrony and Symmetry in Relational Verification**

Lauren Pick(B) , Grigory Fedyukovich , and Aarti Gupta

Princeton University, Princeton, USA {lpick,grigoryf,aartig}@cs.princeton.edu

**Abstract.** Relational safety specifications describe multiple runs of the same program or relate the behaviors of multiple programs. Approaches to automatic relational verification often compose the programs and analyze the result for safety, but a naively composed program can lead to difficult verification problems. We propose to exploit relational specifications for simplifying the generated verification subtasks. First, we maximize opportunities for synchronizing code fragments. Second, we compute symmetries in the specifications to reveal and avoid redundant subtasks. We have implemented these enhancements in a prototype for verifying *k*-safety properties on Java programs. Our evaluation confirms that our approach leads to a consistent performance speedup on a range of benchmarks.

#### **1 Introduction**

The verification of relational program specifications is of wide interest, having many applications. Relational specifications can describe multiple runs of the same program or relate the behaviors of multiple programs. An example of the former is the verification of security properties such as non-interference, where different executions of the same program are compared to check whether there is a leak of sensitive information. The latter is useful for checking equivalence or refinement relationships between programs after applying some transformations or during iterative development of different software versions.

There is a rich history of work on the relational verification of programs. Representative efforts include those that target general analysis using relational program logics and frameworks [4,5,8,27,31] or specific applications such as security verification [1,7,9], compiler validation [16,32], and differential program analysis [17,19,21–23]. These efforts are supported by tools that range from automatic verifiers to interactive theorem-provers. In particular, many automatic verifiers are based on constructing a *composition* over the programs under consideration, where the relational property over multiple runs (of the same or different programs) is translated into a functional property over a single run of a composed program. This has the benefit that standard techniques and tools for program verification can then be applied.

However, it is also well known that a naively composed program can lead to difficult verification problems for automatic verifiers. For example, a *sequential* composition of two loops would require effective techniques for generating loop invariants. In contrast, a *parallel* composition would provide potential for aligning the loop bodies, where relational invariants may be easier to establish than a functional loop invariant. Examples of techniques that exploit opportunities for such alignment include use of type-based analysis with self-composition [29], allowing flexibility in composition to be a mix of sequential and parallel [6], exploiting structurally equivalent programs for compiler validation [32], lockstep execution of loops in reasoning using Cartesian Hoare Logic [27], and merging Horn clause rules for relational verification [13,24].

In this paper, we present a compositional framework that leverages relational specifications to further simplify the generated verification tasks on the composed program. Our framework is motivated by two main strategies. The first strategy, similar to the efforts mentioned above, is to exploit opportunities for *synchrony*, i.e., aligning code fragments across which relational invariants are easy to derive, perhaps due to functional similarity or due to similar code structure, etc. Specifically, we choose to *synchronize* the programs at conditional blocks as well as at loops. Similar to closely related efforts [6,27], we would like to execute loops in lockstep so that relational invariants can be derived over corresponding iterations over the loop bodies. Specifically, we propose a novel technique that analyzes the relational specifications to infer, under reasonable assumptions, *maximal sets of loops* that can be executed in lockstep. Synchronizing at conditional blocks in addition to loops enables simplification due to relational specifications and conditional guards that might result in infeasible or redundant subtasks. Pruning of such infeasible subtasks has been performed and noted as important in existing work [27], and synchronizing at conditional blocks allows us to prune eagerly. More importantly, aligning different programs at conditional statements sets up our next strategy.

Our second strategy is the exploitation of symmetry in relational specifications. Due to control flow divergences or non-lockstep executions of loops, even different copies of the same program may proceed along different code fragments. However, some of the resulting verification subtasks may be indistinguishable from each other due to underlying symmetries among related fragments. We analyze the relational specifications, expressed as formulas in first-order theories (e.g., linear integer arithmetic) with multi-index variables, to discover symmetries and exploit them to prune away redundant subtasks. Prior works on use of symmetry in model checking [11,14,15,20] are typically based on symmetric states satisfying the same set of indexed atomic propositions, and do not consider symmetries among different indices in specifications. To the best of our knowledge, ours is the first work to *extract* such symmetries in relational specifications, and to *use* them for pruning redundant subtasks during relational verification. For extracting these symmetries, we have lifted core ideas from symmetry-discovery and symmetry-breaking in SAT formulas [12] to richer formulas in first-order theories.

The strategies we propose for exploiting synchrony and symmetry via relational specifications are fairly general in that they can be employed in vari-

**Fig. 1.** Example program (left), and eight possible control-flow decisions (right).

ous verification methods. We provide a generic logic-based description of these strategies at a high level (Sect. 4), and also describe a specific instantiation in a verification algorithm based on forward analysis that computes strongestpostconditions (Sect. 5). We have implemented our approach in a prototype tool called Synonym built on top of the Descartes tool [27]. Our experimental evaluation (Sect. 6) shows the effectiveness of our approach in improving the performance of verification in many examples (and a marginal overhead in smaller examples). In particular, exploiting symmetry is crucial in enabling verification to complete for some properties, without which Descartes exceeds a timeout on all benchmark examples.

#### **2 Motivating Example**

Consider three C-like integer programs {P<sup>j</sup>} of the form shown in Fig. <sup>1</sup> (left). They are identical modulo renaming, and we use indices j ∈ {1, <sup>2</sup>, <sup>3</sup>} as subscripts to denote variables in the different copies. We assume that each variable initially takes a nondeterministic value in each program.

A *relational verification problem* (RVP) is a tuple consisting of programs {P<sup>j</sup>}, a relational precondition *pre*, and a relational postcondition *post*. In the example RVPs below, we consider the three conditionals, which in turn lead to eight possible control-flow decisions (Fig. 1, right) in a composed program. Each RVP reduces to subproblems for proving that *post* can be derived from *pre* for each of these control-flow decisions. In the rest of the section, we demonstrate the underlying ideas behind our approach to solve these subproblems efficiently.

*Maximizing Lockstep Execution.* Given an RVP (referred to as *RVP*1) with precondition <sup>x</sup><sup>1</sup> < x<sup>3</sup> <sup>∧</sup> <sup>x</sup><sup>1</sup> <sup>&</sup>gt; <sup>0</sup> <sup>∧</sup> <sup>i</sup><sup>1</sup> <sup>&</sup>gt; <sup>0</sup> <sup>∧</sup> <sup>i</sup><sup>2</sup> <sup>≥</sup> <sup>i</sup><sup>1</sup> <sup>∧</sup> <sup>i</sup><sup>1</sup> <sup>=</sup> <sup>i</sup><sup>3</sup> (*pre*) and postcondition (x<sup>1</sup> < x<sup>3</sup> <sup>∨</sup> <sup>y</sup><sup>1</sup> <sup>=</sup> <sup>y</sup><sup>3</sup>) <sup>∧</sup> <sup>i</sup><sup>1</sup> <sup>&</sup>gt; <sup>0</sup> <sup>∧</sup> <sup>i</sup><sup>2</sup> <sup>≥</sup> <sup>i</sup><sup>1</sup> <sup>∧</sup> <sup>i</sup><sup>1</sup> <sup>=</sup> <sup>i</sup><sup>3</sup> (*post*), consider a control-flow decision <sup>y</sup><sup>1</sup> <sup>&</sup>gt; <sup>20</sup> <sup>∧</sup> <sup>y</sup><sup>2</sup> <sup>&</sup>gt; <sup>20</sup> <sup>∧</sup> <sup>y</sup><sup>3</sup> <sup>&</sup>gt; 20. This leads to another RVP, consisting of three programs of the following form:

$$\{\mathtt{assume}(\mathtt{y}\_{j} > 20); \ \mathtt{while}\ (\mathtt{i}\_{j} < 10) \ \{\mathtt{x}\_{j} \ \mathtt{\*} = \mathtt{i}\_{j}; \ \mathtt{i}\_{j} \nvdash\} \}$$

where j ∈ {1, <sup>2</sup>, <sup>3</sup>}, and the aforementioned *pre* and *post*. From *pre*, it follows that <sup>i</sup><sup>1</sup> <sup>=</sup> <sup>i</sup><sup>3</sup> and <sup>i</sup><sup>2</sup> <sup>≥</sup> <sup>i</sup><sup>1</sup>. We can thus infer that the first and third loops are always executed the same number of times, while the second loop may be executed for fewer iterations. This knowledge lets us infer a single relational invariant for the first and third loops and handle the second loop separately. Clearly, the relational invariant <sup>x</sup><sup>1</sup> < x<sup>3</sup> <sup>∧</sup> <sup>i</sup><sup>1</sup> <sup>=</sup> <sup>i</sup><sup>3</sup> <sup>∧</sup> <sup>i</sup><sup>1</sup> <sup>≤</sup> 10 and the nonrelational invariant <sup>i</sup><sup>2</sup> <sup>≤</sup> 10 are enough to derive *post*. If we were to handle the first and third loop separately, we would need complex nonlinear invariants such as <sup>x</sup><sup>1</sup> <sup>=</sup> <sup>x</sup>1*,init*×i1! <sup>i</sup>1*,init*! and <sup>x</sup><sup>3</sup> <sup>=</sup> <sup>x</sup>3*,init*×i3! <sup>i</sup>3*,init*! , which involve auxiliary variables <sup>x</sup>j,*init* and <sup>i</sup>j,*init* denoting the initial values of <sup>x</sup><sup>j</sup> and <sup>i</sup><sup>j</sup> respectively.

*Symmetry-Breaking.* For the same program, and an RVP (referred to as *RVP*2) with precondition <sup>i</sup><sup>1</sup> <sup>&</sup>gt; <sup>0</sup> <sup>∧</sup> <sup>i</sup><sup>2</sup> <sup>≥</sup> <sup>i</sup><sup>1</sup> <sup>∧</sup> <sup>i</sup><sup>1</sup> <sup>=</sup> <sup>i</sup><sup>3</sup> and postcondition <sup>i</sup><sup>1</sup> <sup>&</sup>gt; <sup>0</sup> <sup>∧</sup> <sup>i</sup><sup>2</sup> <sup>≥</sup> <sup>i</sup><sup>1</sup> <sup>∧</sup> <sup>i</sup><sup>1</sup> <sup>=</sup> <sup>i</sup><sup>3</sup>, consider a control-flow decision <sup>y</sup><sup>1</sup> <sup>&</sup>gt; <sup>20</sup> <sup>∧</sup> <sup>y</sup><sup>2</sup> <sup>&</sup>gt; <sup>20</sup> <sup>∧</sup> <sup>y</sup><sup>3</sup> <sup>≤</sup> 20. We generate another RVP involving the following set of programs:

$$\begin{array}{l} \mathsf{assume}\{\mathsf{y}\_{1} > 20\}; \ \mathsf{while}\ \{\mathsf{i}\_{1} < 10\} \ \{\mathsf{x}\_{1} \*= \mathsf{i}\_{1}; \ \mathsf{i}\_{1} ++; \} \} \\\\ \mathsf{assume}\{\mathsf{y}\_{2} > 20\}; \ \mathsf{while}\ \{\mathsf{i}\_{2} < 10\} \ \{\mathsf{x}\_{2} \*= \mathsf{i}\_{2}; \ \mathsf{i}\_{2} ++; \} \} \\\\ \mathsf{assume}\{\mathsf{y}\_{3} \leq 20\}; \ \mathsf{while}\ \{\mathsf{i}\_{3} < 10\} \ \{\mathsf{x}\_{3} \*\*+; \} \end{array}$$

Similarly, decision <sup>y</sup><sup>1</sup> <sup>≤</sup> <sup>20</sup> <sup>∧</sup> <sup>y</sup><sup>2</sup> <sup>&</sup>gt; <sup>20</sup> <sup>∧</sup> <sup>y</sup><sup>3</sup> <sup>&</sup>gt; 20 generates yet another RVP over the following:

> assume(y<sup>1</sup> ≤ 20); while (i<sup>1</sup> < 10) {x1++; i1++;} assume(y<sup>2</sup> > 20); while (i<sup>2</sup> < 10) {x<sup>2</sup> \*= i2; i2++;} assume(y<sup>3</sup> > 20); while (i<sup>3</sup> < 10) {x<sup>3</sup> \*= i3; i3++;}

Both RVPs have the same precondition and postcondition as *RVP*2. We can see that both RVPs differ only in their subscripts; by taking one and swapping the subscripts 1 and 3 due to symmetry, we arrive at the other. Thus, knowing the verification result for either RVP allows us to skip verifying the other one, by discovering and exploiting such symmetries.

#### **3 Background and Notation**

Given a loop-free program over input variables x and output variables y (such that x and y are disjoint), let *Tr* (x, y) denote its symbolic encoding.

**Proposition 1.** *Given two loop-free programs, Tr* <sup>1</sup>(x<sup>1</sup>, y<sup>1</sup>) *and Tr* <sup>2</sup>(x<sup>2</sup>, y<sup>2</sup>)*, a precondition pre*(x<sup>1</sup>, x<sup>2</sup>)*, and a postcondition post*(y<sup>1</sup>, y<sup>2</sup>)*, the task of relational verification is reduced to checking validity of the following formula.*

> *pre*(x<sup>1</sup>, x<sup>2</sup>) <sup>∧</sup> *Tr* <sup>1</sup>(x<sup>1</sup>, y<sup>1</sup>) <sup>∧</sup> *Tr* <sup>2</sup>(x<sup>2</sup>, y<sup>2</sup>) =<sup>⇒</sup> *post*(y<sup>1</sup>, y2)

Given a program with one loop (i.e., a transition system) over input variables x and output variables y, let *Init*(x, u) denote a symbolic encoding of the block of code before the loop, *Guard*(u) denote the loop guard, and *Tr* (u, y) encode the loop body. Here, u is the vector of local variables that are live at the loop guard. For example, consider the program from our motivating example:

```
assume(y1 > 20); while (i1 < 10) {x1 *= i1; i1++;}
```
In its encoding, x <sup>=</sup> <sup>u</sup> = (i<sup>1</sup>, x<sup>1</sup>, y<sup>1</sup>), y = (i <sup>1</sup>, x <sup>1</sup>), *Init*(x, <sup>u</sup>) = <sup>y</sup><sup>1</sup> <sup>&</sup>gt; 20, *Guard*(u) = i <sup>1</sup> <sup>&</sup>lt; 10, and *Tr* (y) = x <sup>1</sup> <sup>=</sup> <sup>x</sup><sup>1</sup> <sup>×</sup> <sup>i</sup><sup>1</sup> <sup>∧</sup> <sup>i</sup> <sup>1</sup> <sup>=</sup> <sup>i</sup><sup>1</sup> + 1.

u, -**Proposition 2 (Naive parallel composition).** *Given two loopy programs, Init*(x1, u<sup>1</sup>), *Guard*(u<sup>1</sup>), *Tr* (u1, y1) *and Init*(x2, u<sup>2</sup>), *Guard*(u<sup>2</sup>), *Tr* (u2, y2) *, a precondition pre*(x1, x<sup>2</sup>)*, and a postcondition post*(y1, y<sup>2</sup>)*, the task of relational verification is reduced to the task of finding (individual) inductive invariants I***<sup>1</sup>** *and I***2***:*

$$\begin{aligned} pre(\vec{x}\_1, \vec{x}\_2) \land Init(\vec{x}\_1, \vec{u}\_1) &\Longrightarrow I\_1(\vec{u}\_1) \\ pre(\vec{x}\_1, \vec{x}\_2) \land Init(\vec{x}\_2, \vec{u}\_2) &\Longrightarrow I\_2(\vec{u}\_2) \\\ I\_1(\vec{u}\_1) \land Guard\_1(\vec{u}\_1) \land Tr\_1(\vec{u}\_1, \vec{y}\_1) &\Longrightarrow I\_1(\vec{y}\_1) \\\ I\_2(\vec{u}\_1) \land Guard\_2(\vec{u}\_2) \land Tr\_2(\vec{u}\_2, \vec{y}\_2) &\Longrightarrow I\_2(\vec{y}\_2) \\\ I\_1(\vec{y}\_1) \land I\_2(\vec{y}\_2) \land \neg Guard\_1(\vec{y}\_1) \land \neg Guard\_2(\vec{y}\_2) &\Longrightarrow post(\vec{y}\_1, \vec{y}\_2) \end{aligned}$$

Note that the method of naive composition requires handling of multiple invariants, which is known to be difficult. Furthermore, it might lose some important relational information specified in *pre*(x1, x<sup>2</sup>). One way to avoid this is to exploit the fact that loops could be executed in lockstep.

**Proposition 3 (Lockstep composition).** *Given two loopy programs, Init*(x1, u<sup>1</sup>), *Guard*(u<sup>1</sup>), *Tr* (u1, y1) *and Init*(x2, u<sup>2</sup>), *Guard*(u<sup>2</sup>), *Tr* (u2, y2) *, a precondition pre*(x1, x<sup>2</sup>)*, and a postcondition post*(y1, y<sup>2</sup>)*. Let both loops iterate exactly the same number of times, then the task of relational verification is reduced to the task of finding one (relational) inductive invariant I:*

$$\begin{aligned} pre(\vec{x}\_1, \vec{x}\_2) \land Int(\vec{x}\_1, \vec{u}\_1) \land Int(\vec{x}\_2, \vec{u}\_2) &\Longrightarrow I(\vec{u}\_1, \vec{u}\_2) \\ I(\vec{u}\_1, \vec{u}\_2) \land Guard\_1(\vec{u}\_1) \land Tr\_1(\vec{u}\_1, \vec{y}\_1) \land Guard\_2(\vec{u}\_2) \land Tr\_2(\vec{u}\_2, \vec{y}\_2) &\Longrightarrow I(\vec{y}\_1, \vec{y}\_2) \\ I(\vec{y}\_1, \vec{y}\_2) \land \neg Guard\_1(\vec{y}\_1) \land \neg Guard\_2(\vec{y}\_2) &\Longrightarrow post(\vec{y}\_1, \vec{y}\_2) \\ \dots \end{aligned}$$

In this paper, we do not focus on a specific method for deriving these invariants – a plethora of suitable methods have been proposed in the literature, and any of these could be used.

#### **4 Leveraging Relational Specifications**

In this section, we describe the main components of our compositional framework where we leverage relational specifications to simplify the verification subtasks. We first describe our novel algorithm for inferring maximal sets of loops that can be executed in lockstep (Sect. 4.1). Next, we describe our technique for handling conditionals (Sect. 4.2). While this is similar to other prior work, the main purpose here is to set the stage for our novel methods for exploiting symmetry (Sect. 4.3).

#### **4.1 Synchronizing Loops**

Given a set of loopy programs, we would like to determine which ones can be executed in lockstep. As mentioned earlier, relational invariants over lockstep loops are often easier to derive than loop invariants over a single copy.

Our algorithm CheckLockstep takes as input a set of loopy programs {P<sup>1</sup>,...,Pk} and outputs a set of *maximal* classes of programs that can be executed in lockstep. The algorithm partitions its input set of programs and recursively calls CheckLockstep on the partitions.

First, CheckLockstep infers a relational inductive invariant over the loop bodies, synthesizing *<sup>I</sup>*(u1,...,u<sup>k</sup>) in the following:

$$\operatorname{pre}(\vec{x}\_1, \dots, \vec{x}\_k) \land \bigwedge\_{i=1}^k \operatorname{Init}(\vec{x}\_i, \vec{u}\_i) \implies I(\vec{u}\_1, \dots, \vec{u}\_k)$$

$$I(\vec{u}\_1, \dots, \vec{u}\_k) \land \bigwedge\_{i=1}^k \operatorname{Gurard}\_i(\vec{u}\_i) \land \operatorname{Tr}\_i(\vec{u}\_i, \vec{y}\_i) \implies I(\vec{y}\_1, \dots, \vec{y}\_k)$$

CheckLockstep then poses the following query:

$$\neg \left( \left( I(\vec{u}\_1, \ldots, \vec{u}\_k) \land \bigvee\_{i=1}^k \neg Gradient(\vec{u}\_i) \right) \implies \bigwedge\_{i=1}^k \neg guard(\vec{u}\_i) \right) \tag{1}$$

The left-hand side of the implication holds whenever one of the loops has terminated (the relational invariant holds, and at least one of the loop conditions must be false), and the right-hand side holds only if all of the loops have terminated. If the formula is unsatisfiable, then the termination of one loop implies the termination of all loops, and all loops can be executed simultaneously [27]. In this case, the entire set of input programs is one maximal class, and the set containing the set of all input programs is returned.

Otherwise, CheckLockstep gets a satisfying assignment and partitions the input programs into a set *Terminated* and a set *Unfinished*. The *Terminated* set contains all programs <sup>P</sup><sup>i</sup> whose guards *Guard*(u<sup>i</sup>) are false in the model for the formula, and the *Unfinished* set contains the remaining programs. The CheckLockstep algorithm is then called recursively on both *Terminated* and *Unfinished*, with its final result being the union of the two sets returned by these recursive calls.

The following theorem assumes that any relational invariant *<sup>I</sup>*(u<sup>1</sup>,...,uk), generated externally and used by the algorithm, is stronger than any relational invariant *<sup>I</sup>*(u<sup>1</sup>,...,u<sup>i</sup>−<sup>1</sup>, u<sup>i</sup>+1,...,u<sup>k</sup>) that could be synthesized over the same set of k loops with the i *th* loop removed.

**Theorem 1.** *For any call to* CheckLockstep*, it always partitions its set of input programs such that for all* <sup>P</sup><sup>i</sup> <sup>∈</sup> *Terminated and* <sup>P</sup><sup>j</sup> <sup>∈</sup> *Unfinished ,* <sup>P</sup><sup>i</sup> *and* <sup>P</sup><sup>j</sup> *cannot be executed in lockstep.*

*Proof.* Assume that CheckLockstep has partitioned its set of programs into the *Terminated* and *Unfinished* sets. Let <sup>P</sup><sup>i</sup> <sup>∈</sup> *Terminated*, P<sup>j</sup> <sup>∈</sup> *Unfinished* be arbitrary programs. Based on how the partitioning is performed, we know that there is a model for Eq. <sup>1</sup> such that *Guard*(ui) does not hold and *Guard*(uj ) does. We can thus conclude that the following formula is satisfiable:

$$\neg\left(\mathcal{I}(\vec{u}\_1, \dots, \vec{u}\_k) \land \neg Guard(\vec{u}\_i) \implies \neg Guard(\vec{u}\_j)\right)$$

From the assumption on our invariant synthesizer, we conclude that the following is also satisfiable, indicating that <sup>P</sup><sup>i</sup> and <sup>P</sup><sup>j</sup> cannot be executed in lockstep:

$$\neg\left(\varGamma(\vec{u}\_i, \vec{u}\_j) \land \neg\operatorname{Guard}(\vec{u}\_i) \implies \neg\operatorname{Guard}(\vec{u}\_j)\right),$$

where *<sup>I</sup>*(u<sup>i</sup>, <sup>u</sup><sup>j</sup> ) is the relational invariant for <sup>P</sup><sup>i</sup> and <sup>P</sup><sup>j</sup> that our invariant synthesizer infers. 

#### **4.2 Synchronizing Conditionals**

Let two programs have forms if Q<sup>i</sup> then R<sup>i</sup> else Si, where <sup>i</sup> ∈ {1, <sup>2</sup>} and <sup>R</sup><sup>i</sup> and S<sup>i</sup> are arbitrary blocks of code and could possibly have loops. Let them be a part of some RVP, which reduces to applying Propositions 1, 2, or 3, depending on the content of each block of code, to four pairs of programs. As we have seen in previous sections, each of the four verification tasks could be expensive. In order to reduce the number of verification tasks where possible, we use the relational preconditions to filter out pairs of programs for which verification conclusions can be derived trivially.

For <sup>k</sup> programs of the form if Q<sup>i</sup> then R<sup>i</sup> else S<sup>i</sup> for <sup>i</sup> ∈ {1,...,k} and precondition pre(x1,...,x<sup>k</sup>), we can simultaneously generate all possible combinations of decisions by querying a solver for all truth assignments to the Qis:

$$pre(\vec{x}\_1, \dots, \vec{x}\_k) \land \bigwedge\_{i=1}^k Q\_i \tag{2}$$

We can then use the result of this All-SAT query to generate sets of programs in subtasks. For each assignment <sup>j</sup>, where each <sup>Q</sup><sup>i</sup> is assigned a Boolean value <sup>v</sup><sup>i</sup>, the following set is generated: {assume (V1); U<sup>1</sup>,..., assume (Vk); Uk} where for each <sup>i</sup> ∈ {1,...,k}, if <sup>v</sup><sup>i</sup> <sup>=</sup> *true*, then <sup>V</sup><sup>i</sup> <sup>=</sup> <sup>Q</sup><sup>i</sup> and <sup>U</sup><sup>i</sup> <sup>=</sup> <sup>R</sup>i, else <sup>V</sup><sup>i</sup> <sup>=</sup> <sup>¬</sup>Q<sup>i</sup> and U<sup>i</sup> = Si. We need to apply our verification algorithm on only the resulting sets of programs. For example, in our above RVP, if Q<sup>1</sup> is equivalent to Q<sup>2</sup> in all solutions, then the RVP reduces to verification of just two pairs of programs:

$$\begin{aligned} \mathsf{assume} \quad \{\mathsf{Q}\_1\}; \; \mathsf{R}\_1 \quad \text{and} \quad \mathsf{assume} \; \{\mathsf{Q}\_2\}; \; \mathsf{R}\_2\\ \mathsf{assume} \; \{\neg \mathsf{Q}\_1\}; \; \mathsf{S}\_1 \quad \text{and} \quad \mathsf{assume} \; \{\neg \mathsf{Q}\_2\}; \; \mathsf{S}\_2 \end{aligned}$$

**Algorithm 1.** Algorithm for constructing a graph to find symmetries.

1: **procedure** MakeGraph(*F*) 2: (*V,E*) <sup>←</sup> ({*vId* <sup>1</sup> *,...,vId k* }*,* <sup>∅</sup>) where each *<sup>v</sup>Id i* has *color*(*vId i* ) = *Id* 3: **for** *<sup>d</sup>* <sup>∈</sup> Clauses(*F*) **do** (*V,E*) <sup>←</sup> MakeColoredAST(*d*) <sup>∪</sup> (*V,E*) 4: **for** *<sup>v</sup>* <sup>∈</sup> *<sup>V</sup>* with *<sup>x</sup>i* <sup>∈</sup> *vars*(*color*(*v*)) **do** 5: *<sup>V</sup>* <sup>←</sup> (*<sup>V</sup>* \ {*v*}) ∪ {Recolor(*v, v*[*xi* → *<sup>x</sup>*])} 6: *<sup>E</sup>* <sup>←</sup> *<sup>E</sup>* ∪ {(*v, vId i* )}

**Fig. 2.** Graph with vertex names (outside the vertices) and colors (inside the vertices).

#### **4.3 Discovering and Exploiting Symmetries**

Using the All-SAT query from Eq. 2 allows us to prune trivial RVPs. However, as we have seen in Sect. 2, some of the remaining RVPs could be regarded as equivalent due to symmetry. First, we discuss how to identify symmetries in formulas syntactically, and then we show how to use such symmetries.

#### **4.3.1 Identifying Symmetries in Formulas**

Formally, symmetries in formulas are defined as permutations. Note that any permutation π of set {1,...,k} can be lifted to be a permutation of set {x<sup>1</sup>,...,x<sup>k</sup>}.

**Definition 1 (Symmetry).** *Let* x<sup>1</sup>,...,<sup>x</sup><sup>k</sup> *be vectors of the same size over disjoint sets of variables. A* symmetry π *of a formula* F(x<sup>1</sup>,...,x<sup>k</sup>) *is a permutation of set* {<sup>x</sup><sup>i</sup> <sup>|</sup> <sup>1</sup> <sup>≤</sup> <sup>i</sup> <sup>≤</sup> <sup>k</sup>} *such that* <sup>F</sup>(x<sup>1</sup>,...,x<sup>k</sup>) ⇐⇒ F(π(x<sup>1</sup>),...,π(x<sup>k</sup>))*.*

The task of finding symmetries within a set of formulas can be performed syntactically by first canonicalizing the formulas, converting the formulas into a graph representation of their syntax, and then using a graph automorphism algorithm to find the symmetries of the graph. We demonstrate how this can be done for a formula ϕ over Linear Integer Arithmetic with the following example.

Let <sup>ϕ</sup> = (x<sup>1</sup> <sup>≤</sup> <sup>x</sup><sup>2</sup> <sup>∧</sup> <sup>x</sup><sup>3</sup> <sup>≤</sup> <sup>x</sup><sup>4</sup>) <sup>∧</sup> (x<sup>1</sup> < z<sup>2</sup> <sup>∨</sup> <sup>x</sup><sup>3</sup> < z<sup>4</sup>). Note that this formula is symmetric under a permutation of the subscripts that simultaneously swaps 1 with 3 and 2 with 4. Let {(x<sup>1</sup>, z<sup>1</sup>),(x<sup>2</sup>, z<sup>2</sup>),(x<sup>3</sup>, z<sup>3</sup>),(x<sup>4</sup>, z<sup>4</sup>)} be the vectors of variables. We identify a vector by its subscript (e.g., we identify (x<sup>1</sup>, z<sup>1</sup>) by 1).

Our algorithm starts with canonicalizing the formula: <sup>ϕ</sup> = (x<sup>1</sup> < x<sup>2</sup> <sup>∨</sup> <sup>x</sup><sup>1</sup> <sup>=</sup> <sup>x</sup><sup>2</sup>)∧(x<sup>3</sup> < x<sup>4</sup>∨x<sup>3</sup> <sup>=</sup> <sup>x</sup><sup>4</sup>)∧(x<sup>1</sup> < z<sup>2</sup>∨x<sup>3</sup> < z<sup>4</sup>). It then constructs a colored graph for the canonicalized formula with the procedure in Algorithm 1. The algorithm initializes a graph by the set of k vertices v*Id* <sup>1</sup> ,...,v*Id* <sup>k</sup> with color *Id* (vertices 21– 24 in Fig. 2), where k is the number of identifiers. It then (Line 3) adds to the graph the union of the abstract syntax trees (AST) for the formula's conjuncts, where each vertex has a color corresponding to the type of its AST node. If a parent vertex has a color of an ordering-sensitive operation or predicate, then the children should have colors that include a tag to indicate their ordering (e.g., vertices 9 and 10 in Fig. 2 have colors with tags because their parent has color <, but vertices 11 and 12 do not have tags because their parent has color =). Next (Line 4), the algorithm performs an appropriate renaming of vertex colors so that each indexed variable name <sup>x</sup><sup>i</sup> is replaced with a non-indexed version x, while simultaneously adding edges from each vertex with a renamed color to v*Id* <sup>i</sup> . The resulting graph for <sup>ϕ</sup> is shown in Fig. 2. Finally, the algorithm applies a graph automorphism finder to get the following automorphism (in addition to the identity automorphism), which is shown here in a cyclic notation where (x y) means that x → y and y → x (vertices that map to themselves are omitted):

$$(0\ 1)(3\ 5)(4\ 6)(7\ 8)(9\ 13)(10\ 14)(11\ 15)(12\ 16)(17\ 19)(18\ 20)(21\ 23)(22\ 24)$$

We are only interested in permutations of the vectors, so we project out the relevant parts of the permutation (21 23)(22 24) and map them back to our vector identifiers to get the following permutation on the identifiers:

$$\pi = \{1 \mapsto 3, 2 \mapsto 4, 3 \mapsto 1, 4 \mapsto 2\}$$

#### **4.3.2 Exploiting Symmetries**

We now define the notion of symmetric RVPs and application of symmetrybreaking to generate a single representative per equivalence class of RVPs.

**Definition 2 (Symmetric RVPs).** *Two RVPs:* P s, *pre*(x<sup>1</sup>,...,xk), *post*(y<sup>1</sup>,...,yk) *and* P s , *pre*(x<sup>1</sup>,...,x<sup>k</sup>), *post*(y<sup>1</sup>,...,yk) *, where* P s <sup>=</sup> {P<sup>1</sup>,...,P<sup>k</sup>}*, and* P s <sup>=</sup> {P <sup>1</sup>,...,P <sup>k</sup>}*, are called* symmetric *under a permutation* π *iff*


As we have seen in Sect. 4.3.1, identification of symmetries could be made purely on the syntactic level of the relational preconditions and postconditions. For each detected symmetry, it remains to check equivalence between the corresponding programs' encodings, which can be formulated as an SMT problem.

To exploit symmetries, we propose a simple but intuitive approach. First, we identify the set of symmetries using *pre* ∧ *post*. Then, we solve the All-SAT query from Eq. 2 and get a *reduced* set *R* of RVPs (i.e., one without all trivial problems). For each *RVP*<sup>i</sup> ∈ *R*, we perform the relational verification only if no symmetric *RVP*<sup>j</sup> ∈ *R* has already been verified. Thus, the most expensive part of the routine, checking equivalence of RVPs, is performed on demand and only on a subset of all possible pairs *RVP*i, *RVP*<sup>j</sup> .

Alternatively, in some cases (e.g., for parallelizing the algorithm) it might help to identify all symmetric RVPs prior to solving the All-SAT query from Eq. 2. From this set, we can generate symmetry-breaking predicates (SBPs) [12] and conjoin them to Eq. 2. Constrained with SBPs, this query will have fewer models, and will contain a single representative per equivalence class of RVPs. We describe how to construct SBPs in more detail in the next section.

#### **4.3.3 Generating Symmetry-Breaking Predicates (SBPs)**

SBPs have previously been applied in pruning the search space explored by SAT solvers. Traditionally, techniques construct SBPs based on symmetries in truth assignments to the literals in the formula, but SBP-construction can be adapted to be based on symmetries in truth assignments to conditionals, allowing us to break symmetries in our setting.

We can construct an SBP by treating each condition the way a literal is treated in existing SBP constructions. In particular, we can construct the common Lex-Leader SBP used for predicate logic [12], which in our case will force a solver to choose the lexicographically least representative per equivalence class for a particular ordering of the conditions. For the ordering of conditions where <sup>Q</sup><sup>i</sup> <sup>≤</sup> <sup>Q</sup><sup>j</sup> iff <sup>i</sup> <sup>≤</sup> <sup>j</sup> and a set of symmetries <sup>S</sup> over {1,...,k}, we can construct a Lex-Leader SBP SBP(S) = <sup>π</sup>∈<sup>S</sup> P P(π) with the more efficient predicate chaining construction [2], where we have that

$$PP(\pi) = p\_{\min(I)} \land \bigwedge\_{i \in I} p\_i \implies g\_{prev(i,I)} \implies l\_i \land p\_{next(i,I)}$$

and that I is the support of π with the last condition for each cycle removed, min(I) is the minimal element of I, *prev*(i, I) is the maximal element of I still less than i or 0 if there is none, *next*(i, I) is the minimal element of I still greater than <sup>i</sup> or 0 if there is none, <sup>p</sup><sup>0</sup> <sup>=</sup> <sup>g</sup><sup>0</sup> <sup>=</sup> *true*, <sup>p</sup><sup>i</sup> is a fresh predicate for <sup>i</sup> = 0, <sup>g</sup><sup>i</sup> <sup>=</sup> <sup>Q</sup>π(i) <sup>=</sup><sup>⇒</sup> <sup>Q</sup><sup>i</sup> for <sup>i</sup> = 0, and <sup>l</sup><sup>i</sup> <sup>=</sup> <sup>Q</sup><sup>i</sup> <sup>=</sup><sup>⇒</sup> <sup>Q</sup>π(i).

After constructing the SBP, we conjoin it to the All-SAT query in Eq. 2. Our solver now generates sets of programs that, when combined with the relational precondition and postcondition, form a set of irredundant RVPs.

*Example.* Let us consider how SBPs can be applied to *RVP*<sup>2</sup> from Sect. 2 to avoid generating two of the eight RVPs we would otherwise generate.

First, we see that our three programs are all copies the same program and are at the same program point, so they will have the same encoding. Next, we find the set of permutations S over {1, <sup>2</sup>, <sup>3</sup>} such that for each π <sup>∈</sup> S, we have that <sup>i</sup><sup>1</sup> <sup>&</sup>gt; <sup>0</sup> <sup>∧</sup> <sup>i</sup><sup>2</sup> <sup>≥</sup> <sup>i</sup><sup>1</sup> <sup>∧</sup> <sup>i</sup><sup>1</sup> <sup>=</sup> <sup>i</sup><sup>3</sup> iff <sup>i</sup>π(1) <sup>&</sup>gt; <sup>0</sup> <sup>∧</sup> <sup>i</sup>π(2) <sup>≥</sup> <sup>i</sup>π(1) <sup>∧</sup> <sup>i</sup>π(1) <sup>=</sup> <sup>i</sup>π(3). In this case, we have that S is the set of permutations {{<sup>1</sup> → <sup>1</sup>, <sup>2</sup> → <sup>2</sup>, <sup>3</sup> → <sup>3</sup>}, {<sup>1</sup> → <sup>3</sup>, <sup>2</sup> → <sup>2</sup>, <sup>3</sup> → <sup>3</sup>}}. Now, we construct a Lex-Leader SBP (using the predicate chaining construction described above):

$$(p\_1 \land (p\_1 \implies ((y\_1 > 20) \implies (y\_2 > 20))))$$

where <sup>p</sup><sup>1</sup> is a fresh predicate. Conjoining this SBP to Eq. 2, leads to the RVPs arising from the control-flow decisions <sup>y</sup><sup>1</sup> <sup>&</sup>gt; <sup>20</sup> <sup>∧</sup> <sup>y</sup><sup>2</sup> <sup>&</sup>gt; <sup>20</sup> <sup>∧</sup> <sup>y</sup><sup>3</sup> <sup>≤</sup> 20 and <sup>y</sup><sup>1</sup> <sup>&</sup>gt; <sup>20</sup> <sup>∧</sup> <sup>y</sup><sup>2</sup> <sup>≤</sup> <sup>20</sup> <sup>∧</sup> <sup>y</sup><sup>3</sup> <sup>≤</sup> 20 no longer being generated.

#### **5 Instantiation of Strategies in Forward Analysis**

We now describe an instantiation of our proposed strategies in a verification algorithm based on forward analysis using a strongest-postcondition computation. Other instantiations, e.g., on top of a Horn solver based on Property-Directed Reachability [24] are possible, but outside the scope of this work.

1: **procedure** Verify(*pre, Current,Ifs, Loops, post*) 2: **while** *Current* <sup>=</sup> <sup>∅</sup> **do** 3: **if** ProcessStatement(*pre, Pi,Ifs, Loops, post*) = safe **then return** safe 4: **if** *Loops* <sup>=</sup> <sup>∅</sup> **then** HandleLoops(*pre, Loops, post*) 5: **else if** *Ifs* <sup>=</sup> <sup>∅</sup> **then** HandleIfs(*pre,Ifs, Loops, post*) 6: **else return** unsafe

Given an RVP in the form of a Hoare triple {*Pre*} <sup>P</sup><sup>1</sup>|| · · · ||P<sup>k</sup> {*Post*}, where || denotes parallel composition, the top-level Verify procedure takes as input the relational specification *pre* = *Pre* and *post* = *Post*, the set of input programs *Current* <sup>=</sup> {P<sup>1</sup>,...,P<sup>k</sup>}, and empty sets *Loops* and *Ifs*. It uses a strongestpostcondition computation to compute the next Hoare triple at each step until it can conclude the validity of the original Hoare triple.

*Synchronization.* Throughout verification, the algorithm maintains three disjoint sets of programs: one for programs that are currently being processed (*Current*), one for programs that have been processed up until a loop (*Loops*), and one for programs that have been processed up until a conditional statement (*Ifs*). The algorithm processes statements in each program independently, with ProcessStatement choosing an arbitrary interleaving of statements from the programs in *Current*. When the algorithm encounters the end of a program in its call to ProcessStatement, it removes this program from the *Current* set. At this point, the algorithm returns safe if the current Hoare triple is proven valid. When a program has reached a point of control-flow divergence and is processed by ProcessStatement, it is removed from *Current* and added to the appropriate set (*Loops* or *Ifs*).

*Handling Loops.* Once all programs are in the *Loops* or *Ifs* sets (i.e. *Current* = ∅), the algorithm handles the programs in the *Loops* set if it is nonempty. HandleLoops behaves like CheckLockstep but computes postconditions where possible; when a set of loops are able to be executed in lockstep, HandleLoops computes their postconditions before placing the programs into the *Terminated* set. After all loops have been placed in the *Terminated* set and a new precondition *pre* has been computed, rather than returning *Terminated*, HandleLoops invokes Verify(*pre* , *Terminated*,*Ifs*, <sup>∅</sup>, *post*).

*Handling Conditionals.* When *Current* = *Loops* = ∅, Verify handles conditional statements. HandleIfs exploits symmetries by using the All-SAT query with Lex-Leader SBPs as described in Sect. 4 and calls Verify on each generated verification problem.

### **6 Implementation and Evaluation**

To evaluate the effectiveness of increased lockstep execution of loops and symmetry-breaking, we implemented our algorithm from Sect. 5 on top of the <sup>D</sup>escartes tool for verifying k-safety properties, i.e., RVPs over k identical Java programs. We implemented two variants: Syn uses only synchrony (i.e., no symmetry is used), while Synonym uses both. All implementations (including Descartes) use the same guess-and-check invariant generator (the same originally used by Descartes, but modified to generate more candidate invariants). In Synonym, we compute symmetries in preconditions and postconditions only when all program copies are the same. For our examples, it sufficed to compute symmetries simply by checking if each possible permutation leads to equivalent formulas<sup>1</sup>. We compare the performance of our prototype implementations to Descartes<sup>2</sup>. We use two metrics for comparison: the time taken and the number of Hoare triples processed by the verification procedure. All experiments were conducted on a MacBook Pro, with a 2.7 GHz Intel Core i5 processor and 8 GB RAM.

#### **6.1 Stackoverflow Benchmarks**

The first set of benchmarks we consider are the Stackoverflow benchmarks originally used to evaluate Descartes. These implement (correctly or incorrectly) the Java Comparator or Comparable interface, and check whether or not their *compare* functions satisfy the following properties:

<sup>1</sup> Our implementation includes the syntactic symmetry-finding algorithm from Sect. 4.3.1, though we do not use it for evaluation here due to its high overhead in using an external tool for finding graph automorphisms.

<sup>2</sup> While there are several tools for relational verification (e.g. Rosette/Unbound [25], VeriMapRel [13], Reve [17], MoCHi [17], SymDiff [22]), most of these do not handle Java programs, and to the best of our knowledge, none of these tools has support for *k*-safety verification for *k* greater than 2.

P1: <sup>∀</sup>x, y.*sgn*(*compare*(x, y)) = <sup>−</sup>*sgn*(*compare*(y, x)) P2: <sup>∀</sup>x, y, z.(*compare*(x, y) > <sup>0</sup> <sup>∧</sup> *compare*(y, z) > 0) =<sup>⇒</sup> *compare*(x, z) > <sup>0</sup> P3: <sup>∀</sup>x, y, z.(*compare*(x, y) = 0) =<sup>⇒</sup> (*sgn*(*compare*(x, z)) = *sgn*(*compare*(y, z)))

(One of the original 34 Stackoverflow examples is excluded from our evaluation here because of the inability of the invariant generator to produce a suitable invariant.) We compare the results of running Syn and Synonym vs. Descartes for each property in Table 1. (Expanded versions and plots of these results are available in an extended version of the paper [26].)

Because property P1 contains a symmetry, we notice an improvement in terms of number of Hoare triples with the use of symmetry for this property; however, the overhead of computing symmetries leads to Synonym performing more slowly than Syn even for some examples that exhibit reduced Hoare triple counts. Property P1 is also the easiest to prove (all implementations can verify each example in under 0.3 s), so the overheads contribute more significantly to the runtime. For examples on which our implementations do not perform as well as Descartes, we perform reasonably closely to Descartes. These examples are typically smaller, and again overheads play a larger role in our poorer performance.

**Table 1.** Stackoverflow Benchmarks. Total times (in seconds) and Hoare triple counts (HTC) for Stackoverflow benchmarks, where for each property, the results for Syn and Synonym are divided into those for examples where they exhibit a factor of improvement over Descartes that is greater or equal to 1 (top) and those for which they do not (bottom). *Improv* reports the factor of improvement over Descartes, where the number of examples is given in parentheses.


#### **6.2 Modified Stackoverflow Benchmarks**

The original Stackoverflow examples are fairly small, with all implementations taking under 6 s to verify any example. To assess how we perform on larger examples, we modified several of the larger Stackoverflow comparator examples to be longer, take more arguments, and contain more control-flow decisions. The resulting functions take three arguments and pick the "largest" object's id, where comparison among objects is performed based on the original Stackoverflow example code. (Ties are broken by choosing the least id.) We check whether these *pick* functions satisfy the following properties that allow reordering input arguments:

P13: <sup>∀</sup>x, y, z.*pick*(x, y, z) = *pick*(y, x, z) P14: <sup>∀</sup>x, y, z.*pick*(x, y, z) = *pick*(y, x, z) <sup>∧</sup> *pick*(x, y, z) = *pick*(z, y, x)

Note that P13 allows swapping the first two input arguments, while P14 allows any permutation of inputs, a useful hyperproperty.

The results from running property P13 are shown in Table 2. We see here that for these larger examples, Hoare triple counts are more reliably correlated with the time taken to perform verification. Syn outperforms Descartes on 14 of the 16 examples, and Synonym outperforms both Descartes and Syn on all 16 examples.

The results from running property P14 are shown in Table 3. For this property, note thatDescartes is unable to verify any of the examples within a onehour timeout. Meanwhile, Syn is able to verify 10 of the 16 examples without exceeding the timeout. Exploiting symmetries here exhibits an obvious improvement, with Synonym not only being able to verify the same examples as Syn, with consistently faster performance on the larger examples, but also being able to verify an additional example within an hour.


**Table 2.** Verifying P13 for modified Stackoverflow examples. Times (in seconds) and Hoare triple counts (HTC).


**Table 3.** Verifying P14 for modified Stackoverflow examples. Times (in seconds) and Hoare triple counts (HTC). - indicates that no sufficient invariant could be inferred.

*Summary of Experimental Results.* Our experiments indicate that our performance improvements are consistent: on all Descartes benchmarks (in Table 1, which are all small) our techniques either have low overhead or show some improvement despite the overhead; and on modified (bigger) programs they lead to significant improvements. In particular, we report (Table 2) speedups up to 21.4x (on an example where the property doesn't hold) and 4.2x (on an example where it does). More importantly, we report (Table 3) that Descartes times out on 14 examples, where of these Synonym times out for 2 and cannot infer an invariant for one example.

#### **7 Related Work**

The work most closely related to ours is by Sousa and Dillig [27], which proposed Cartesian Hoare Logic (CHL) for proving k-safety properties and the tool Descartes for automated reasoning in CHL. In addition to the core program logic, CHL includes additional proof rules for loops, referred to as Cartesian Loop Logic (CLL). A generalization of CHL, called Quantitative Cartesian Hoare Logic was subsequently used by Chen et al. [10] to detect side-channel vulnerabilities in cryptographic implementations.

In terms of comparison, neither CHL nor CLL force alignment at conditional statements or take advantage of symmetries. We believe our algorithm for identifying a maximal set of lockstep loops is also novel and can be used in other methods that do not rely on CHL/CLL. On the other hand, CLL proof rules allow not only fully lockstep loops, but also *partially* lockstep loops. Although we did not consider it here, our maximal lockstep-loop detection algorithm can be combined with their partial lockstep execution to further improve the efficiency of verification. For example, applying the Fusion 2 rule from CLL to our example while loops generated from *RVP*<sup>1</sup> (Sect. 2) would result in *three* subproblems and require reasoning twice about the second copy's loop finishing later. When combined with maximal lockstep-loop detection, we could generate just *two* subproblems: one where the first and third loops terminate first, and another where the second loop terminates first.

Other automatic efforts for relational verification typically use some kind of product programs [6,13,17,21,22,24,28], with a possible reduction to Horn solving [13,17,21,24]. Similarly to our strategy for synchrony, most of them attempt to leverage similarity (structural or functional) in programs to ease verification. However, we have seen less focus on leveraging relational specifications themselves to simplify verification tasks, although this varies according to the verification method used. Some efforts do not reason over product programs at all, relying on techniques based on decomposition [3] or customized theories with theorem proving [4,30] instead. To the best of our knowledge, none of these efforts exploit symmetry in programs or in relational specifications.

On the other hand, symmetry has been used very successfully in model checking parametric finite state systems [11,15,20] and concurrent programs [14]. Our work differs from these efforts in two main respects. First, the parametric systems considered in these efforts have components that interact with each other or share variables. Second, the correctness specifications are also parametric, usually single-index or double-index properties in a propositional (temporal) logic. In contrast, in our RVPs, the individual programs are independent and do not share any common variables. The only interaction between them is via relational specifications. Furthermore, we discover symmetries in these relational specifications over multi-index variables, expressed as formulas in first-order theories (e.g., linear integer arithmetic). We then exploit these symmetries to prune redundant RVPs during verification.

There are also some similarities between relational verification and verification of concurrent/parallel programs. In the latter, a typical verifier [18] would use *visible* operations (i.e., synchronization operations or communication on shared state) as synchronizing points in the composed program. In our work, this selection is made based on the structure of the component programs and the ease of utilizing or deriving relational assertions for the code fragments. Furthermore, one does not need to consider different orderings in interleavings of programs in the RVPs. Since these fragments are independent, it suffices to explore any one ordering. Instead, we exploit symmetries in the relational assertions to prune away redundant RVPs.

Finally, specific applications may impose additional synchrony requirements pertaining to visibility. For example, one may want to check for information leaks from private inputs to public outputs not only at the end of a program but at other specified intermediate points, or information leakage models for side-channel attacks may check for leaks based on given observer models [1]. Such requirements can be viewed as relational specifications at selected synchronizing points in the composed program. Again, we can leverage these relational specifications to simplify the resulting verification subproblems.

#### **8 Conclusions and Future Work**

We have proposed novel techniques for improving relational verification, which has several applications including security verification, program equivalence checking, and regression verification. Our two key ideas are maximizing the amount of code that can be synchronized and identifying symmetries in relational specifications to avoid redundant subtasks. Our prototype implementation on top of the Descartes verification tool leads to consistent improvements on a range of benchmarks. In the future, we would be interested in implementing these ideas on top of a Horn-based relational verifier (e.g., [25]) and extending it to work with recursive data structures. We are also interested in developing an algorithm for finding symmetries in formulas that does not rely on an external graph automorphism tool.

**Acknowledgements.** We gratefully acknowledge the help from Marcelo Sousa and I¸sil Dillig on their Descartes tool, which provides the base for our prototype development and experimental comparison. This work was supported in part by NSF Grant 1525936.

#### **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

### **JBMC: A Bounded Model Checking Tool for Verifying Java Bytecode**

Lucas Cordeiro1,2(B) , Pascal Kesseli<sup>1</sup>, Daniel Kroening1,2 , Peter Schrammel1,3 , and Marek Trtik<sup>1</sup>

> Diffblue Ltd., Oxford, UK University of Oxford, Oxford, UK lucas.cordeiro@manchester.ac.uk University of Sussex, Brighton, UK

**Abstract.** We present a bounded model checking tool for verifying Java bytecode, which is built on top of the CPROVER framework, named *Java Bounded Model Checker* (JBMC). JBMC processes Java bytecode together with a model of the standard Java libraries and checks a set of desired properties. Experimental results show that JBMC can correctly verify a set of Java benchmarks from the literature and that it is competitive with two state-of-the-art Java verifiers.

#### **1 Introduction**

The Java Programming Language is a general-purpose, concurrent, strongly typed, object-oriented language [13]. Applications written in Java are compiled to the bytecode instruction set and binary format as defined in the Java Virtual Machine (JVM) specification. This compiled Java bytecode can run on all platforms on top of a JVM without the need for recompilation. However, Java programs may have bugs, which may result in array bound violations, unintended arithmetic overflows, and other kinds of functional and runtime errors. In addition, Java allows multi-threading, and thus, problems such as race conditions and deadlocks can occur.

To detect such issues, we developed an extension to the C Bounded Model Checker (CBMC) [6], named JBMC,<sup>1</sup> that verifies Java bytecode. JBMC consists of a frontend for parsing Java bytecode and a Java operational model (JOM), which is an exact but verification-friendly model of the standard Java libraries. A distinct feature of JBMC, when compared with other approaches [2,7,9], is the use of Bounded Model Checking (BMC) [4] in combination with Boolean Satisfiability and Satisfiability Modulo Theories (SMT) [3] and full symbolic state-space exploration, which allows us to perform a bit-accurate verification

Support by ERC project 280053 CPROVER and the H2020 FET OPEN 712689 SC<sup>2</sup>.

<sup>1</sup> Available at https://www.cprover.org/jbmc/.

c The Author(s) 2018

H. Chockler and G. Weissenbacher (Eds.): CAV 2018, LNCS 10981, pp. 183–190, 2018. https://doi.org/10.1007/978-3-319-96145-3\_10

of Java programs. Apart from JBMC, there are other Java verifiers, which use different verification approaches.

**Existing Java Verifiers.** *JayHorn* is a verifier for Java bytecode [9] that uses the Java optimization framework Soot [14] as a front-end and then produces a set of constrained Horn clauses to encode the verification condition (VC). *Java Path Finder* (JPF) is an explicit-state and symbolic software model checker for Java bytecode [2]. JPF is used to find and explain defects, collect runtime information as coverage metrics, deduce test vectors, and create corresponding test drivers for Java programs. JPF checks for property violations such as deadlocks or unhandled exceptions along all potential execution paths as well as userspecified assertions. *ESC/Java* is a compile-time extended static checker, which detects common programming errors (e.g., null dereference, array bounds errors, and type cast errors) [7]. It uses an automatic theorem prover to catch bugs that go beyond the abilities of the Java type checker, including runtime errors and synchronization errors in concurrent programs.

#### **2 JBMC: A Bounded Model Checker for Java Bytecode**

#### **2.1 Architecture and Implementation**

Our front-end integrates a class loader, which accepts Java bytecode *class* files and *jar* archives (Fig. 1). The parse trees for the classes are translated into the CPROVER CFG representation, which is called a *GOTO* program [6].

**Fig. 1.** JBMC verification process

To handle polymorphism, JBMC encodes virtual method dispatch into a *switch* over the runtime type information attached to the object in order to select the correct method to be called. Similarly, the complex control flow arising from exceptions is encoded into conditional branches. We record the exception thrown in a global variable, which is then used to propagate the exception up the call stack until a matching *catch* statement (if any) to handle the error is reached. JBMC can detect when the JVM would abort due to an exception that is not caught within the program.

The resulting *GOTO* program is then passed to the bounded model checking algorithm for finding bugs. The BMC algorithm symbolically executes the program, unwinding loops and unfolding recursive function calls up to a given bound. The resulting bit-vector formula is then passed on to the configured SAT or SMT solver [6].

#### **2.2 Java Operational Model**

The Java language relies on compiler-generated functions and classes as well as a large standard library. In order to correctly support Java functionality, we developed an abstract representation of the standard Java libraries, called the operational model (OM). The use of OMs is commonplace in analysers for Java; for instance, a similar approach was previously proposed for the formal verification of Android applications [12]. Currently, our OM consists of models of the most common classes from *java.lang* and a few from *java.util*. Our Java OM simplifies the implementation of the standard Java library by removing verification-irrelevant performance optimizations (e.g., in the implementation of container classes), exploiting declarative specifications (using *assume*) and functions that are built into the CPROVER framework (e.g., for array and string manipulation). We are continuously extending our OM to speed up verification by replacing the original standard Java library classes by our models.

Java has an assert(*c*) statement for specifying safety properties. In addition, we provide API classes that allow users to define non-deterministic verification harnesses and stub functions. The API contains such methods for primitive types (e.g., int nondetInt()) and *generic* methods (i.e., parametrised by a type T) as <T> T nondetWithNull() and <T> T nondetWithoutNull() to nondeterministically initialize object references that may or may not be null. The API also provides an assume(*c*) method, which advises JBMC to ignore paths that do not satisfy a user-specified condition *c*.

Currently, JBMC handles neither the Java Native Interface, which allows Java code to interface native libraries, nor reflection, which allows the program to inspect and manipulate itself at runtime. We are currently extending JBMC to support generics and lambdas; and to verify multi-threaded Java programs (that use *java.lang.Thread*), exploiting the partial order encoding technique of [1].

#### **2.3 String Solver**

One of the biggest challenges in verifying Java programs is the widespread use of character strings, which makes verification problems resulting from Java programs highly complex. Solving such constraints is an active area of research [5,8,11]. JBMC implements a solver for strings to determine the satisfiability of a set of constraints involving string operations. Our string solver supports the most common basic accesses (e.g., obtain the length of a string and a character at a given position); comparisons (e.g., lexicographic comparison and equality); transformations (e.g., insertion, concatenation, replacement, and removal); and conversions (e.g., conversion of the primitive data types into a string and parsing them from a string). The axioms for these operations use quantified constraints. For instance, a Java expression s.substring(5) is translated into a predicate *substring*(*res, s,* 5), where *res*, *s* are pairs (*length, charArray*), representing the resulting and the input string s, respectively; and *substring* is axiomatized by the formula ∀*i.*(0 ≤ *i* ∧ *i < s.length* − 5) → (*res.length* = *s.length* − 5)∧(*res.charArray*[*i*] = *s.charArray*[*i* + 5]). The universal quantifiers are handled using quantifier elimination [10].

#### **2.4 JBMC Usage**

Runtime errors in Java (e.g., illegal memory access) are detected by the JVM and an appropriate exception is thrown (e.g., NullPointerException, ArrayIndex-OutOfBoundsException). An AssertionError is thrown on violation of a condition specified by the programmer using the assert keyword. JBMC analyzes the program and verifies whether such error conditions occur.

JBMC can be used to analyze a single class file:<sup>2</sup> jbmc C.class --unwind *k* or a Java archive (jar) file: jbmc file.jar --main-class class --unwind *k*. In both cases the entry point for the analysis of the program is the static void main method of the specified main class. *k* is a positive integer limiting the number of times loops are unwound and recursions are unfolded. If no bug is found, up to a *k*-depth unwinding, then JBMC reports VERIFICATION SUCCESSFUL; otherwise, it reports VERIFICATION FAILED along with a counterexample in the form of an execution trace (--trace), which contains the full variable assignment in each program state with file, method, and line information. Note that if the Java bytecode is compiled with debug information, then JBMC can also provide the original program variable names in the counterexample, rather than just bytecode variable slots. Further JBMC options can be retrieved via jbmc --help.

**Fig. 2.** Verification results for JayHorn, JBMC and JPF

<sup>2</sup> If a class C is in a package x.y, then compile it to *some-dir*/x/y/C.class, and in *some-dir* execute *jbmc-installation-dir*/jbmc x/y/C.class --unwind *k*.

**Fig. 3.** Runtime comparison of JBMC to JayHorn and JPF

#### **3 Experimental Evaluation**

There is no standard benchmark suite for Java verification. Therefore, we took our entire regression test suite consisting of 177 benchmarks (including known bugs and hard benchmarks that JBMC cannot yet handle); these benchmarks (denoted as "jbmc") test common Java features (e.g., polymorphism, exceptions, arrays, and strings). We also used 23 recursive benchmarks (denoted as "recursive") taken from the JayHorn repository [9], and 64 minepump benchmarks (denoted as "minepump") from the SV-COMP repository. Additionally, we have extracted 104 benchmarks from the JPF regression test suite [2]. The following table summarizes the characteristics of the benchmark sets:<sup>3</sup>


#### **3.1 Objectives and Setup**

Our experiments aim at answering two research questions: [RQ1] **(correctness)** How accurate is JBMC when verifying the chosen benchmarks? [RQ2] **(performance)** How does JBMC performance compare to other existing verifiers? To answer both questions, we analyze all benchmarks with three Java verifiers

<sup>3</sup> Benchmarks and detailed results are available at https://www.cprover.org/jbmc.

(JBMC v5.8-cav18, JayHorn v0.5.1, and JPF v32) on an Intel Core i7-6700 CPU 8×3. 40 GHz, with 32 GB of RAM, running Ubuntu 16.04 LTS. We restrict CPU time and memory to 300 s and 15 GB, respectively. JBMC uses a stepwise approach to unwinding loops (to prove unbounded safety) and runs with MiniSat2 as its SAT backend.

#### **3.2 Results**

Figure 2 gives an overview of the experimental results for the four benchmark suites. *Correct safe* means that the program was analyzed to be free of errors, *correct unsafe* means that the error in the program was found, *incorrect safe* means that the program had an error but the verifier did not find it, *incorrect unsafe* means that an error is reported for a program that fulfills the specification, *timeout* indicates that the verifier has exceeded the time limit, and *error* represents an internal failure in the verifier or exhaustion of available memory. The following table summarizes the overall results:


The experimental results show that JBMC reached a successful verification rate of approximately 89% while JayHorn reported 51% and JPF 75%, which positively answers RQ1. JayHorn and JPF currently produce 6 times more *incorrect* results (i.e., bugs in the tool) than JBMC. To answer RQ2, Fig. 3 compares the analysis times for the benchmarks where the tools return correct results. None of the three tools is consistently better than the other two. JBMC is faster than JPF on 176 benchmarks, JPF is faster than JBMC on 93. JBMC is faster than JayHorn on 222 benchmarks, whereas JayHorn is faster than JBMC on 25. In comparison to JayHorn, JBMC deals poorly with recursion, as its analysis led to timeout for 69% of the recursive benchmarks, whereas JayHorn could only solve a single benchmark from the minepump benchmark suite. In summary, we observed that JBMC's scalability depends mainly on the complexity of string operations, loops, recursion and (floating-point) arithmetic.

#### **4 Conclusions and Future Work**

Despite more than 15 years of research in BMC and Java verification, JBMC is the first BMC-based Java verifier. To achieve this, we based our implementation on an industrial-strength verification framework, and developed a Java OM, removing verification-irrelevant optimizations and exploiting declarative specifications and built-in functions. Because of the prevalent use of character strings in Java programs, we have also developed a string solver using an efficient quantifier elimination scheme. We compare JBMC to JayHorn and JPF, which are state-of-the-art verifiers for Java bytecode based on constrained Horn clauses and path-based symbolic execution, respectively. Experimental results show that JBMC achieves a successful verification rate of 89% compared to 51% of Jay-Horn and 75% of JPF. For future work, the Java OM will be extended to support more Java classes, with the goal of speeding up verification of larger Java applications. In addition, we are currently extending JBMC to verify multi-threaded programs.

**Acknowledgments.** We thank P. R¨ummer and W. Visser for helpful discussions about JayHorn and JPF, respectively.

#### **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

### **Eager Abstraction for Symbolic Model Checking**

Kenneth L. McMillan(B)

Microsoft Research, Redmond, USA kenmcmil@microsoft.com

**Abstract.** We introduce a method of abstraction from infinite-state to finite-state model checking based on eager theory explication and evaluate the method in a collection of case studies.

#### **1 Introduction**

In constructing decision procedures for arithmetic formulas and other theories, a successful approach has been to separate propositional reasoning and theory reasoning in a modular way. This approach is usually called Satisfiability Modulo Theories, or SMT [1]. There are two primary approaches to SMT: *eager* and *lazy* theory explication. Both approaches abstract the formula in question by constructing its propositional skeleton, that is, converting each atomic predicate to a corresponding free Boolean variable. Obviously, propositional abstraction loses a great deal of information. The eager approach compensates for this by conjoining tautologies of the theory to the formula before propositional abstraction. In abstract interpretation terms, we can think of this as a *semantic reduction*: it makes the formula more explicit without changing its semantics. The lazy approach, on the other hand, performs the propositional abstraction first, then retroactively adds tautologies of the theory to rule out infeasible propositional models.

In this paper, we will consider applying the same concepts to the symbolic model checking problem (SMC). In this problem, we are given a Kripke model M that is expressed implicitly using logical formulas, and a temporal formula φ, and we wish to determine whether <sup>M</sup> <sup>|</sup><sup>=</sup> <sup>φ</sup>. The states of the Kripke model are structures of a logic L over a given vocabulary, while the set of initial states I and the set of transitions T are expressed, respectively, by one- and two-vocabulary formulas. The atomic propositions in φ are also presumed to be expressed in L.

In the case where L is propositional logic, the Kripke model is finite-state, the SMC problem is PSPACE-complete, and many well-developed techniques are available to solve it in a heuristically efficient way. On the other hand, if L is a richer logic (say, Presburger arithmetic) SMC is usually undecidable. Here, we propose to solve instances of this problem by separating propositional reasoning and theory reasoning in a modular way, as in SMT. Given an SMC problem (I,T,φ), we will form its propositional abstraction by computing the propositional skeletons of I, T and φ. This abstraction is sound, and allows us to apply well-developed tools for propositional SMC, however it loses a great deal of information. To compensate for this loss, we will use incomplete eager theory explication. By controlling theory explication, the user controls the abstraction. We will call this general approach *eager symbolic model checking*, or ESMC.

**Related Work.** Because of the propositional abstraction, ESMC may at first seem to be a form of predicate abstraction [9]. This is not the case, however. Predicate abstraction uses a vocabulary of predicates to abstract the state, but does not abstract the theory itself. As a result, a decision procedure for the theory is needed to compute the best abstract transformer. This is problematic if the logic is undecidable, and in any event requires an exponential number of decision procedure calls in the worst case. In ESMC, the abstraction is performed in a purely syntactic way. One controls the abstraction by giving a set of axiom schemata to be instantiated and by introducing prophecy variables, as opposed to giving abstraction predicates. One effect of this is that the abstraction may depend on the precise syntactic expression of the transition relation.

The technique of "datatype reductions" [18] is also closely related. This method has been used to verify various parameterized protocols and microarchitectures using finite-state model checking [5,6,12,19,20]. The technique also abstracts an infinite-state SMC problem to a finite-state one syntactically. Though it does not do this by explicating the theory, we will see that the abstraction it produces can be simulated by ESMC. Compared to this method, ESMC is user-extensible and allows both a simpler theoretical account and a simpler implementation. Moreover, it uses a smaller trusted computing base, since the tautologies it introduces can be mechanically checked.

The methods of Invisible Invariants [25] and Indexed Predicate Abstraction [14] use different methods to compute the least fixed point in a finite abstract domain of quantified formulas. This requires decidability and incurs a relatively high cost for computing an extremal fixed point, limiting scalability (though IPA can approximate the best transformer in the undecidable case). The abstractions are also difficult to refine in practice.

**Road Map.** After preliminaries in the next section, we introduce our schemabased class of abstractions in Sect. 3. The next section gives some useful instantiations of this class. Section 5 describes a methodology for exploiting the abstraction in proofs of infinite-state systems, as implemented in the IVy tool. In Sect. 5, we evaluate the approach using case studies.

#### **2 Preliminaries**

Let F O=(S, Σ) be standard sorted first-order logic with equality, where S is a collection of first-order sorts and Σ is a vocabulary of sorted non-logical symbols. We assume a special sort <sup>B</sup> <sup>∈</sup> <sup>S</sup> that is the sort of propositions. Each symbol <sup>f</sup> *<sup>S</sup>* <sup>∈</sup> <sup>Σ</sup> has an associated sort <sup>S</sup> of the form <sup>D</sup>1×···×D*<sup>n</sup>* <sup>→</sup> <sup>R</sup>, where <sup>D</sup>*i*, R <sup>∈</sup> <sup>S</sup> and <sup>n</sup> <sup>≥</sup> 0 is the *arity* of the symbol. If <sup>n</sup> = 0, we say <sup>f</sup> *<sup>S</sup>* is a *constant*, and if R = B it is a *relation*. We write vocab(t) for the set of non-logical symbols occurring in term t.

Given a set of sorts S, a *universe* U maps each sort in S to a non-empty set (with <sup>U</sup>(B) = {, ⊥}). An *interpretation* of a vocabulary <sup>Σ</sup> over universe <sup>U</sup> maps each symbol <sup>f</sup>*D*1×···×*Dn*→*<sup>R</sup>* in <sup>Σ</sup> to a function in <sup>U</sup>(D1) ×···× <sup>U</sup>(D*n*) <sup>→</sup> <sup>U</sup>(R). A <sup>Σ</sup>-structure is a pair <sup>M</sup> = (U, <sup>I</sup>) where <sup>U</sup> is a universe and <sup>I</sup> is an interpretation of Σ over U. The structure is a *model* of a proposition φ in F O=(S, Σ) if <sup>φ</sup> evaluates to under <sup>I</sup> according to the standard semantics of first-order logic. In this case, we write M |<sup>=</sup> <sup>φ</sup>. Given an interpretation <sup>J</sup> with domain disjoint from <sup>I</sup>, we write <sup>M</sup>,<sup>J</sup> to abbreviate the structure (U, I∪J ).

In the sequel, we take the vocabulary Σ to be a disjoint union of four sets: Σ*S*, the *state* symbols, Σ *<sup>S</sup>* the *primed* symbols, <sup>Σ</sup>*<sup>T</sup>* the *temporary* symbols, and <sup>Σ</sup>*B*, the *background* symbols. We take (·) to be a bijection <sup>Σ</sup>*<sup>S</sup>* <sup>→</sup> <sup>Σ</sup> *<sup>S</sup>* and extend it in the expected way to terms and interpretations. We write unprime(t) for the term u such that u = t, if u exists.

<sup>A</sup> *transition system* is a pair (I,T) where <sup>I</sup> is a proposition over <sup>Σ</sup>*<sup>S</sup>* <sup>∪</sup> <sup>Σ</sup>*<sup>B</sup>* and <sup>T</sup> is a proposition over <sup>Σ</sup>. Let <sup>M</sup>*<sup>B</sup>* = (U, <sup>I</sup>*B*) be a <sup>Σ</sup>*B*-structure (that is, fix the universe and the interpretation of the background symbols). A U-*state* of the system is an interpretation of <sup>Σ</sup>*<sup>S</sup>* (the state symbols) over <sup>U</sup>. A <sup>M</sup>*B*-*run* of the system is an infinite sequence s0, s1,... of U-states such that:


That is, under the background interpretation, the initial state must satisfy the initial condition, and for every successive pair of states, there must be an interpretation of the temporary symbols such that the transition condition is satisfied. The temporary symbols are used, for example, to model local variables of procedures, and may also be Skolem symbols. Because they can have secondorder sort, we cannot existentially quantify them within the logic, so instead we quantify them implicitly in the transition system semantics. Given a background theory <sup>T</sup> over <sup>Σ</sup>*B*, a <sup>T</sup> -*run* is any <sup>M</sup>*B*-run such that <sup>M</sup>*<sup>B</sup>* <sup>|</sup><sup>=</sup> <sup>T</sup> .

A *linear temporal formula* over Σ applies the operators of F O=(S, Σ) plus the standard strict until operator <sup>U</sup> and strict since operator <sup>S</sup>. We define <sup>φ</sup> <sup>=</sup> ⊥Uφ, <sup>φ</sup> <sup>=</sup> <sup>φ</sup> ∧ ¬(U ¬φ) and also <sup>H</sup><sup>φ</sup> <sup>=</sup> <sup>φ</sup>S⊥, meaning "always <sup>φ</sup> in the strict past". We fix <sup>T</sup> and say (I,T) <sup>|</sup><sup>=</sup> <sup>φ</sup> if every <sup>T</sup> -run of (I,T) satisfies <sup>φ</sup> under the standard LTL semantics. The symbolic model checking problem SMC is to determine whether (I,T) <sup>|</sup><sup>=</sup> <sup>φ</sup>.

#### **3 A Schema-Based Abstraction Class**

An *atom* is a proposition in which every instance of {∧,∨,¬, <sup>U</sup>, S} occurs under a quantifier. The *propositional skeleton* of a proposition φ is obtained by replacing each atom in φ by a corresponding propositional constant. The propositional skeleton is an abstraction, in the sense that for every model M of φ we can construct a model of its propositional skeleton from the truth values of each atomic proposition in M. We will use propositional skeletons here to convert an infinite-state model checking problem to a finite-state one.

We assume that each vocabulary Σ*B*, Σ*<sup>S</sup>* and Σ*<sup>T</sup>* contains a countably infinite set of propositional constants. This allows us to construct injections A*B*, <sup>A</sup>*S*, <sup>A</sup>*<sup>T</sup>* from atomic propositions of the logic to propositional constants in <sup>Σ</sup>*B*, Σ*<sup>S</sup>* and Σ*<sup>T</sup>* respectively.

In defining the propositional skeleton of a transition formula we must consider atomic propositions containing symbols from more than one vocabulary. To which vocabulary should we map such an atom in the propositional skeleton? Here, we take a simple solution that is sound, though it may lose some state information. That is, for any atomic proposition φ, we say


That is, pure background propositions are abstracted to background symbols, state propositions are abstracted to state symbols and next-state propositions are abstracted to the primed version of the corresponding state proposition. Everything else is abstracted to a temporary symbol (which is existentially quantified in the abstract transition relation).

We then extend A to non-atomic formulas in the obvious way, such that <sup>A</sup>(<sup>φ</sup> <sup>∧</sup> <sup>ψ</sup>) = <sup>A</sup>(φ) ∧ A(ψ), <sup>A</sup>( <sup>φ</sup>) = <sup>A</sup>(φ) and so on. The following theorem shows that we can use propositional skeletons to convert infinite-state to finitestate model checking problems in a sound (but incomplete) way:

**Theorem 1.** *For any symbolic transition system* (I,T) *and linear temporal formula* <sup>φ</sup>*, if* (A(I), <sup>A</sup>(T)) <sup>|</sup><sup>=</sup> <sup>A</sup>(φ) *then* (I,T) <sup>|</sup><sup>=</sup> <sup>φ</sup>*.*

Intuitively, this holds because we can convert every concrete counterexample to an abstract one by simply extracting the truth values of the atomic propositions.

**Theory Explication.** While propositional skeletons are sound, they lose a great deal of information. For example, suppose our transition relation is y = <sup>x</sup>. Given a predicate <sup>p</sup>, we would like to infer that <sup>p</sup>(x) <sup>⇒</sup>p(y). However, in the propositional skeleton, the transition relation <sup>A</sup>(T) is just <sup>A</sup>*<sup>T</sup>* (y <sup>=</sup> <sup>x</sup>). In other words, it is just a free propositional symbol with no relation to any other proposition. Thus, we cannot prove the abstracted property <sup>A</sup>(p(x)) <sup>⇒</sup> <sup>A</sup>(p(y)).

To mitigate this loss of information, we use *theory explication*. That is, before abstracting T, we conjoin to it tautologies of the logic or the background theory. This doesn't change the semantics of T, and thus the set of runs of the transition system remains unchanged. It does, however, change the propositional skeleton. For example, <sup>y</sup> <sup>=</sup> <sup>x</sup> <sup>∧</sup> <sup>p</sup>(x) <sup>⇒</sup> <sup>p</sup>(y ) is a tautology of the theory of equality. If we conjoin this formula to T in the above example, the abstract transition relation becomes <sup>A</sup>*<sup>T</sup>* (y <sup>=</sup> <sup>x</sup>) <sup>∧</sup> (A*<sup>T</sup>* (y <sup>=</sup> <sup>x</sup>) ∧ A*S*(p(x)) ⇒ A*S*(p(y)) ) which is strong enough to prove the abstracted property.

In general, theory explication adds predicates to the abstraction. This is the only mechanism we will use to add predicates; we will not supply them manually, or obtain them automatically from counterexamples. The following theorem justifies model checking with eager theory explication:

**Theorem 2.** *For any symbolic transition system* (I,T)*, linear temporal formula* <sup>φ</sup>*,* <sup>Σ</sup>*<sup>B</sup>* <sup>∪</sup> <sup>Σ</sup>*<sup>S</sup> formula* <sup>ψ</sup>*<sup>I</sup> and* <sup>Σ</sup> *formula* <sup>ψ</sup>*<sup>T</sup> , if* T |<sup>=</sup> <sup>ψ</sup>*<sup>I</sup>* <sup>∧</sup> <sup>ψ</sup>*<sup>T</sup> then* (<sup>I</sup> <sup>∧</sup> <sup>ψ</sup>*<sup>I</sup>* , T <sup>∧</sup> <sup>ψ</sup>*<sup>T</sup>* ) <sup>|</sup><sup>=</sup> <sup>φ</sup> *iff* (I,T) <sup>|</sup><sup>=</sup> <sup>φ</sup>*.*

The question, of course, is how to choose the tautologies in ψ*<sup>I</sup>* and ψ*<sup>T</sup>* . This is not just a question of capturing the transition relation semantics, since theory explication also determines the FO predicates representing state of the finite abstraction. Thus, complete theory explication is at least as hard as predicate discovery in predicate abstraction. Our goal is not to solve this problem, but to find an effective incomplete strategy that is useful in practice. It is important that the resulting finite-state model checking problems be easily resolved by a modern model checker, and that in case the strategy fails, a human can use the resulting counterexample and effectively refine the abstraction.

**Schema-Based Theory Explication.** The basic approach we will use to controlling theory explication is a restricted case of the pattern-based quantifier instantiation method introduced in the Simplify prover [8]. That is, we are given a set of axioms, and for each axiom a set of triggers. A trigger is a term (or terms) containing all of the free variables in the axiom. The trigger is matched against all ground subterms in the formula being explicated. Each match induces an instance of the axiom.

In our example above, suppose we have the axiom <sup>Y</sup> <sup>=</sup> <sup>X</sup> <sup>∧</sup> <sup>p</sup>(X) <sup>⇒</sup> <sup>p</sup>(<sup>Y</sup> ) with a trigger Y = X (here and in the sequel, capital letters will stand for free variables). The trigger Y = X matches the ground term y = x in T which generates the ground instance <sup>y</sup> <sup>=</sup> <sup>x</sup> <sup>∧</sup> <sup>p</sup>(x) <sup>⇒</sup> <sup>p</sup>(y ). Since we match modulo the symmetry of equality, we also get <sup>x</sup> <sup>=</sup> <sup>y</sup> <sup>∧</sup> <sup>p</sup>(y ) <sup>⇒</sup> <sup>p</sup>(x).

A risk of trigger-based instantiation is the matching loop. For example, if we have the axiom f(X) > X + 1 with a trigger f(X), then we can generate an infinite sequence of instantiations: f(y) > y + 1, f(f(y)) > f(y) + 1 and so on. A simple approach to prevent this is to bound the number of generations of matching. In practice, we will use just one generation and expand the set axioms in cases where more than one generation is needed. This has the benefit of keeping the number of generated terms small, which limits the size of the SMC problem and also makes it easier for users to understand counterexamples.

To avoid having to write a large number of axioms, we specify the axioms using general schemata. A schema is a parameterized axiom. It takes a list of sorts and symbols as parameters and yields an axiom. In the sequel we will use s and t to stand for sort parameters. As an example, here is a general congruence schema that can be used in place of our axiom above:

$$\frac{f:s\to t}{X=Y\Rightarrow f(X)=f(Y)\\ \{X=Y\}}$$

The trigger is in curly braces. We first instantiate the axiom schemata for all possible parameter valuations using the sorts and symbols of the concrete system. Then we ground the resulting axioms using pattern-based instantiation.

One further technique is needed, however, to ground the quantifiers occurring in the formula being explicated. Quantifiers usually occur in the transition relations of parameterized systems either in the guards of guarded commands or in state updates. As an example, suppose a given command sets the state of process p to 'ready'. This would appear in the transition formula as a constraint such as the following:

$$\forall x. \text{ state}'(x) = \text{ready if } x = p \text{ else state}(x)$$

If this quantifier is not instantiated, then all information about process state will be lost. To avoid this, we would like to apply the following schema:

$$\frac{y:s, \ p:s \to \mathbb{B}}{(\forall X. \ p(X)) \Rightarrow p(y) \ \{\forall X. \ p(X)\}}$$

Here we intend that p should match *any* predicate with one free variable and not just a predicate symbol (including non-temporal sub-formulas of the property to be proved). However, rather than implement a general second-order matching scheme, it is simpler to build this particular schema into the theory explication process. There is some question as to which ground terms to supply for the parameter y. As with other schemata, only constants are used in the current implementation. This appears to be adequate, but it might also be useful to allow the user to supply explicit triggers for quantifiers in the transition system or property.

The theory explication process thus has three steps:


Notice this is a slight departure from the policy of one generation of matching, since terms generated in step 1 can be used to match axioms in step 3. This is important in practice since without grounding the quantifiers there may be no ground terms to match in step 3.

#### **4 Example Abstractions in the Class**

A typical approach to verifying parameterized protocols with finite-state model checking is to track the state of a representative fixed collection of processes and abstract away the state of the remaining processes. In this approach, introduced in [17], a small collection background constants (typically two or three) is used to identify the tracked processes. For each process identifier in the system, the abstraction records whether it is equal to each of the tracked ids, but carries no further information. For each function f over process ids, the abstraction maintains the value of f(x) only if x is equal to one of the background constants. This approach has been used, for example, to verify processor microarchitectures [12,16,17] and cache coherence protocols [5,6,19].

This abstraction can be implemented using schema-based instantiation. The high-level idea is to create a set of schemata that make it possible to abstractly evaluate terms in a bottom-up manner.

For example, consider an occurrence t = u of the equality operator where t and <sup>u</sup> are terms of sort <sup>s</sup>. The abstract value of this term is if <sup>t</sup> and <sup>u</sup> are both equal to some background constant <sup>c</sup>, <sup>⊥</sup> if <sup>t</sup> <sup>=</sup> <sup>c</sup> and <sup>u</sup> <sup>=</sup> <sup>c</sup>, and otherwise is unknown. To implement this abstraction, we use the following schemata:

$$\frac{c:s}{X=c \land Y=c \Rightarrow X=Y \; \{X=Y\}} \quad \frac{c:s}{X=c \land Y \neq c \Rightarrow X \neq Y \; \{X=Y\}}$$

The triggers of these two schemata cause them to be applied to every occurrence of an equality operator in the formula being abstracted.

For an application f(t) of a function symbol, the abstract value is the abstraction of f(c) if t is equal to background constant c, and is otherwise unknown. This fact could be captured by chaining the congruence schema above with the above two equality schemata. That is, matching the congruence schema, we obtain <sup>t</sup> <sup>=</sup> <sup>c</sup> <sup>⇒</sup> <sup>f</sup>(t) = <sup>f</sup>(c). Then matching the equality operator schemata with this result, we obtain (in the contrapositive) <sup>f</sup>(t) = <sup>f</sup>(c) <sup>∧</sup> <sup>f</sup>(c) = <sup>d</sup> <sup>⇒</sup> <sup>f</sup>(t) = <sup>d</sup> and <sup>f</sup>(t) = <sup>f</sup>(c) <sup>∧</sup> <sup>f</sup>(c) <sup>=</sup> <sup>d</sup> <sup>⇒</sup> <sup>f</sup>(t) <sup>=</sup> <sup>d</sup> (for any background constants c, d). Recall, however, that we allow only one generation of matching, so this second matching step will not occur. Instead, we write the above two facts explicitly as a schema:

$$\frac{c:s, \ d:t, \ f:s \to t}{X = c \Rightarrow (f(X) = d \Leftrightarrow f(c) = d) \ \{f(X)\}}$$

This schema is matched for every application of a symbol of arity one in the formula. We also specify similar schemata for arities greater than one. Notice that this schema also applies to relation symbols if we treat and ⊥ as background constants of sort B. However, for relations and functions to finitely enumerated sorts, it is more efficient to use the congruence schema, since it produces fewer instances.

Finally, we need one additional schema to guarantee that the abstract values are consistent with the equality relation on the background constants:

$$\frac{c:s, \ d:s}{X = c \Rightarrow (X = d \Leftrightarrow c = d) \ \{X\}}$$

Notice that this axiom is instantiated for every term in the formula (though in practice not for propositions). Though it doesn't affect satisfiability of formulas, it is also helpful to add reflexivity, symmetry and transitivity over the background constants as it makes the resulting counterexamples easier to understand.

These schemata produce an abstraction of the formula that is at least as strong as the datatype reduction for scalarset types described in [18]. In fact, this is true if we restrict the application of the schemata to constants c and d in the set of background constants, which we do in practice. The cost of the abstraction is moderate, since the number of axiom instances is directly proportional to the size of the formula and to the number of background constants.

An advantage of the schema-based explication approach is that we can use it to construct abstractions for various datatypes and even use different abstractions of the same datatype for different applications. As an example, consider an abstraction for totally ordered datatypes such as the integers. We want the abstraction to track, for any term t of this sort, whether it is equal to, less than or greater than each background constant. The abstract value of a term t is captured by the values of the predicates t<c and t = c for background constants c. We begin with the abstract semantics of equality given above. The abstract semantics of the < relation can be given by the following schemata (where <sup>t</sup> <sup>≤</sup> <sup>c</sup> is an abbreviation for t<c <sup>∨</sup> <sup>t</sup> <sup>=</sup> <sup>c</sup>):

$$\begin{array}{c} c:s\\ \hline X \le c \land c < Y \Rightarrow X < Y \; \{X < Y\} \end{array} \begin{array}{c} c:s\\ X < c \land c \le Y \Rightarrow X < Y \; \{X < Y\} \end{array}$$

$$\begin{array}{c} c:s\\ Y \le c \land \neg(X < c) \Rightarrow \neg(X < Y) \; \{X < Y\} \end{array}$$

By chaining the congruence schema with these, we can obtain the abstract semantics of function application, but again we wish to limit the number of matching generations to one. Thus, as with equality, we write an explicit schema combining the two steps:

$$\frac{c:s, \ d:t, \ f:s \to t}{X = c \Rightarrow (f(X) < d \Leftrightarrow f(c) < d) \ \{f(X)\}}$$

We also require that the abstract value of every term be consistent with the interpretation of = and < over the background constants. This gives us:

$$\frac{c:s}{\neg(X = c \land X < c) \nmid \{X\}} \quad \frac{c:s, \ d:t}{X \le d \land \neg(X < c) \Rightarrow c \le d \; \{X\}}$$

With the equality schemata, these imply that the background constants are totally ordered. As an extension, if the totally ordered sort has a least element 0, we can add it as a background constant along with the axiom <sup>¬</sup>(X < 0).

This abstraction is a bit weaker than the "ordset" abstraction used, for example, in [20]. We can simulate that abstraction by adding schemata that interpret the + operator, and facts about numeric constants such as 0 < 1. In general, for a given datatype, we can tailor an abstraction that captures just the properties of that type needed to prove a given system property. This extensibility makes the schema-based approach more flexible and possibly more efficient than the built-in abstractions of [18]. The above schemata have been verified by Z3.

#### **5 Proof Methodology**

In the previous sections, we developed an approach to produce a sound finitestate abstraction of an infinite-state system using eager theory explication and propositional skeletons. Now we consider how to construct proofs of systems using this approach. This section is essentially a summary of some results in [18].

The first question that arises is how to obtain the set of background constants that determine the abstraction. Generally speaking these arise as prophecy variables. For example, suppose we wish to prove a mutual exclusion property of the form -<sup>∀</sup>x, y. p(x) <sup>∧</sup> <sup>p</sup>(y) <sup>⇒</sup> <sup>x</sup> <sup>=</sup> <sup>y</sup>. To do this, we replace the bound variables x and y with fresh background constants a and b, to obtain the quantifier-free property <sup>p</sup>(a) <sup>∧</sup> <sup>p</sup>(b) <sup>⇒</sup> <sup>a</sup> <sup>=</sup> <sup>b</sup>. In effect <sup>a</sup> and <sup>b</sup> are immutable prophecy variables that predict the values of x and y for which the property will fail. By introducing prophecy variables, we refine the abstraction so that it tracks the state of the pair of processes that ostensibly cause the mutual exclusion property to fail. We hope, of course, to prove that there are no such processes. We apply the following theorem to introduce prophecy variables soundly:

**Theorem 3.** *Let* (I,T) *be a symbolic transition system,* x:s *a variable,* φ(x) *a temporal formula and* v:s *a background symbol not occurring in* I,T,φ*. Then* (I,T) <sup>|</sup><sup>=</sup> -<sup>∀</sup>x. φ(x) *iff* (I,T) <sup>|</sup><sup>=</sup> φ(v)*.*

This theorem can be applied as many times as needed to eliminate universal quantifiers from an invariance property. Further refinement can be obtained if needed by manually adding prophesy variables. For example, suppose that each process x has a ticket number t(x), and we wish to track the ticket number held by process a at the time of the failure. To do this, we replace our property with the property <sup>c</sup> <sup>=</sup> <sup>t</sup>(a) <sup>⇒</sup> (p(a) <sup>∧</sup> <sup>p</sup>(b) <sup>⇒</sup> <sup>a</sup> <sup>=</sup> <sup>b</sup>) where <sup>c</sup> is a fresh background constant. In general, we can introduce additional prophecy variables using this theorem:

**Theorem 4.** *Let* (I,T) *be a transition system,* φ *a temporal formula and* t *a term. Then* (I,T) <sup>|</sup><sup>=</sup> <sup>φ</sup> *iff* (I,T) <sup>|</sup><sup>=</sup> -<sup>∀</sup>x. x <sup>=</sup> <sup>t</sup> <sup>⇒</sup> <sup>φ</sup>*, where* <sup>x</sup> *is not free in* <sup>φ</sup>*.*

The theorem can be applied repeatedly to introduce as many prophecy variables as needed to refine the abstraction. The introduced quantifiers can be converted to background symbols by the preceding theorem.

Since our abstraction tracks the state of only processes a and b, a protocol step in which an untracked process sends a message to a or b is likely to produce an incorrect result in the abstraction. To mitigate this problem, we assume by induction over time that our universally quantified invariant property φ has always held in the strict past. This makes use of the following theorem:

**Theorem 5.** *Let* (I,T) *be a symbolic transition system, and* φ *a temporal formula. Then* (I,T) <sup>|</sup><sup>=</sup> <sup>φ</sup> *iff* (I,T) <sup>|</sup><sup>=</sup> -(Hφ) <sup>⇒</sup> <sup>φ</sup>*.*

The quantifiers in φ will be instantiated with ground terms in T. Thus, in our mutual exclusion example, we can rely on the fact that the sender of a past message (identified by some temporary symbol) is not in its critical section if either a or b are. Using induction in this way can mitigate the loss of information in the finite abstraction. Note we can pull quantifiers out of the above implication in order to apply Theorem 3. That is, (H∀x. φ) ⇒ ∀x.φ is equivalent to <sup>∀</sup>x. (H∀x. φ) <sup>⇒</sup> <sup>φ</sup>.

If the above tactics fail to prove an invariant property because the abstraction loses too much information, we can strengthen the invariant by adding conjuncts to it. These conjuncts have been called "non-interference lemmas", since they serve to reduce the interference with the tracked processes that is caused by loss of information about the untracked processes. We use the following theorem:

**Theorem 6.** *Let* (I,T) *be a symbolic transition system, and* φ, ψ *temporal formulas. Then if* (I,T) <sup>|</sup><sup>=</sup> <sup>φ</sup> <sup>∧</sup> <sup>ψ</sup> *then* (I,T) <sup>|</sup><sup>=</sup> φ*.*

The general proof approach has the following steps:


**Implementation in IVy.** This approach has been implemented in the IVy tool [15]. In IVy, the state of the model is expressed in terms of mutable functions and relations over primitive sorts. The language is procedural, and allows the expression of protocol models as interleavings of atomic guarded commands, the semantics of which is expressible in first-order logic.

To implement the approach, IVy's language was augmented with a syntax for expressing schemata. The schemata of Sect. 4 were added to the tool's standard library. Syntax is also provided to decorate invariant assertions with terms to be used as prophecy variables. IVy extends the above theory slightly by allowing invariant properties to be asserted not only between commands, but also in the middle of sequential commands. This can be convenient, since it allows invariants to reference local variables inside the commands.

With this input, the tool applies the six transformation steps detailed above to produce a purely propositional SMC problem. This problem is then converted to the AIGER format [2], a standard for hardware model checking. At present, the system only handles safety properties of the form -(Hφ) <sup>⇒</sup> <sup>φ</sup>, where <sup>φ</sup> is non-temporal. The AIGER format does support liveness, however, and this is planned as a future extension.

The resulting AIGER file is passed to the tool ABC [4] which uses its implementation of property driven reachability [10] to check the property. The counterexample, if any, is converted back to a run of the abstract transition system. The propositional symbols in this run are converted back to the corresponding atoms by inverting the abstraction mapping A. This yields an *abstract counterexample*: a sequence of predicate valuations that correspond to both the state and temporary symbols in the abstraction.

The abstract counterexample may be spurious in the sense that it corresponds to no run of the concrete transition system. In this case, the user must analyze the trace to determine where necessary information was lost and either modify the invariant or refine the abstraction by adding a prophecy variable.

#### **6 Case Studies**

In this section, we consider the proof of safety properties of four parameterized algorithms and protocols. We wish to address three main questions. First, is the abstraction approach efficient? That is, if we construct an abstract model using schema-based theory explication, can the resulting finite-state problem be solved using a modern symbolic model checker? Second, is the methodology usable? That is, can a human user construct a proof using the methodology by analyzing the abstract counterexamples? Third, when is it more effective than the current best alternative, which is to write an inductive invariant manually and check it using an SMT solver, as in [11]? We will call this approach "invariant checking". We note that predicate abstraction is not suitable to these examples because the invariants require complex quantified formulas while current methods that synthesize quantified invariants for parameterized systems are unreliable in practice and do not scale well.

The last question in particular has not been well addressed in prior work on model checking approaches to parameterized verification. In most cases, either no comparison was made, or comparison was made to proofs using general-purpose proof assistants, which tend to be extremely laborious and do not make use of current state-of-the art proof automation techniques. To make a reasonably direct comparison, we construct proofs of each model using both methodologies, using the same language and tool, using the state-of-the art tools ABC [4] for model checking and Z3 [7] for invariant checking.

To apply the invariant checking method, some of the protocol models have been slightly re-encoded. In particular, it is helpful in some cases to use relations rather than functions in modeling the protocol state, as this can prevent the prover from diverging in a "matching loop" [8]. This re-encoding adds negligibly to the proof effort and is arguably harmless, since it does not appear in practice to affect the difficulty of refining the model to a concrete implementation.

Our four example models are:



**Table 1.** Comparison of proofs using two methodologies.

A comparison of the proofs obtained using the two methodologies is shown in Table 1. The column "size" shows the textual size of the model plus property in lexical tokens. The columns labeled |Inv| give the size of the auxiliary invariants used in the proofs, expressed in the number of lexical tokens not including the property to be proved. Since both methods require the user to supply auxiliary invariants and discovering this invariant is the largest part of the effort in both cases, this number provides a fairly direct comparison of the complexity of the proofs. In both methodologies, the user also defines history or "ghost" variables that help in expressing the invariant. The number of these variables is shown in the columns labeled HVars. In the model checking approach, the user also refines the abstraction by defining prophecy variables. These were not used in the invariant checking proofs. The closest analogy in invariant checking proofs to this type of information would be quantifier instantiations or triggers provided by the user. This was not needed, however, since the methodology of [22] was applied to ensure that all verification conditions reside in a decidable fragment of the logic. For the model checking methodology, the number of distinct terms supplied by the user as prophecy variables is shown in the column labeled PVars. The time columns show the total time in seconds for model checking or invariant checking for the completed proofs on a 2.6 GHz Intel Xeon CPU using one core. Times to produce counterexamples were generally faster.

When measuring the overall complexity of the proofs, it is unclear how to weight the three kinds of information supplied by the user. In a sense, prophecy variables are the easiest to handle, since their behavior is monotone. That is, adding a prophecy variable only increases precision so it cannot cause passing invariants to fail. Ghost variables are more conceptually difficult to introduce, since the invariants depend on them. If a ghost variable definition is changed to repair a failing invariant, this may cause a different invariant to fail. Similarly if we strengthen a passing invariant, it may fail to be proved and if we weaken a failing one it may cause other formerly passing invariants to fail. This instability can cause the manual proof search to fail to converge and is the chief cause of conceptual difficulty in constructing proofs in both methodologies. Having said this, for lack of a principled way to weight the different aspects of the proof effort, we will measure the proof size as simply the sum of the number of lexical tokens in the auxiliary invariant, the history variable definitions, and all terms used as prophecy variables. The total proof size is shown in the columns labeled |Pf|.

These numbers should be taken as unreliable for several reasons that are common to any attempt to measure the effectiveness of a proof methodology. First, the size of the proof (or any other measure of the proof difficulty, such as expended time) can depend on the proficiency of the user in the particular methodology. Even if the same user produces both proofs, the user's proficiency in the two methodologies may differ, and knowledge gained in the first proof will effect the second one. Since resources were not available to train and test a statistically significant population users in both methodologies (assuming such could be found) the numbers presented here should not be considered a direct comparison of the methods. Rather, they are presented to support some observations made below about the specific case studies and proofs.

**Case Study: Tomasulo's Algorithm.** This is a simple abstract model of a processor microarchitecture that executes instructions concurrently out of order. The model state consists of a register file, a set of reservation stations (RS) and a set of execution units (EU) and is parameterized on the size of each of these, as well as the data word size. The machine's instructions are register-to-register and are modeled abstractly by an uninterpreted function. Each register has a flag that records whether it is the destination of a pending instruction. If so, its tag indicates which RS is holding that instruction. Each RS stores the tags of its instruction arguments, and waits for these to be computed before issuing the instruction to an EU.

Both proofs are based on history variables that record the correct values of arguments and result for each RS. The principal invariant of both states that the arguments obtained by all RS's are correct. In the model checking case, the abstraction is refined by making the tags of these arguments and chosen EU into prophecy variables. This allows the model checker to track enough state information to prove the main invariant, though one additional "non-interference" lemma is needed to guarantee that other EU's do not interfere by producing an incorrect tag. An interesting aspect of the invariant is that it does not refer to the states of the register file or EU's. The necessary invariants of these structures can be inferred by the model checker. On the other hand, this information must be supplied explicitly in the manual invariant. As the table shows, the resulting invariant is more complex.

**Case Study: German's Cache Protocol.** This simple distributed directorybased cache coherence protocol allows the caches to communicate directly only with the directory. The property proved is coherence, in effect that exclusive copies are exclusive. In the model checking proof, there is one non-interference lemma, stating that no cache produces a spurious invalidation acknowledgment message. No extra prophecy variables are need, as tracking the state of just the two caches that produce the coherence failure suffices. The manual invariant on the other hand is much more detailed, in fact about an order of magnitude larger. This is because it must relate the state of all the various types of messages in the network to the cache and directory states. These relationships were inferred automatically by the model checker, resulting in a much simpler proof.

**Case Study: FLASH Cache Coherence Protocol.** This is a much more complex (and realistic) distributed cache coherence protocol model. The increased protocol complexity derives from the fact that information can be transferred directly from one cache to another. In a typical transaction, a cache sends a request to the directory for (say) an exclusive copy of a cache line. The directory forwards the request to the current owner of the line, which then sends a copy to the original requester, as well as a response to the directory confirming the ownership transfer. Handling various race conditions in this scheme makes both the protocol and its proof complex. Again the property proved is coherence. The model checking proof is similar to [19], though there data correctness and liveness were proved.

In this case, three non-interference lemmas are used in the model checking proof, ruling out three types of spurious messages. Also two additional prophecy variables are needed. For example, one of these identifies the cache that sent an exclusive copy. This allows the abstraction to track the state of the third participant in the triangular transaction described above. Generally, protocols with more complex communication patterns require more prophecy variables to refine the abstraction.

As with German's protocol, and for the same reason, the manual invariant is an order of magnitude larger. In this case, the additional protocol complexity makes it quite challenging to converge to an invariant and a large number of strengthenings and weakenings were needed.

**Case Study: Virtually Synchronous Paxos.** This is a high-level model of a distributed consensus protocol, designed to allow a collection of processes to agree on a sequence of decisions, despite process and network failures. This model was previous proved by a manual invariant to be consistent, meaning that two decisions for a given index never disagree [21].

The protocol operates in a sequence of epochs, each of which has a leader process. The leader proposes decision values and any proposal that receives votes of a majority of processes becomes a decision. When the leader fails the protocol must move on to a new epoch. For consistency, any decisions that are possibly made in the old epoch must be preserved in the new. This is accomplished by choosing a majority of processes to start the new epoch and preserving all of their votes. Any decision having a majority of votes in the old epoch must have one voter in the new epoch's starting majority and thus must be preserved. The choice of an epoch's starting majority is itself a single-decree consensus problem. This is solved in a sequence of rounds called "stakes". A stake can be created by a majority of processes and proposes the votes of some majority to be carried to the next epoch. Each process in the stake promises not accept any lesser stake with differing votes. If a majority accepts the stake, then the votes of that stake can be passed to the next epoch.

The important auxiliary invariants of the model checking proof are these:


Perhaps not surprisingly, the manual invariant is much larger. The model checking proof, however, requires many extra prophecy variables. This is mainly accounted for by the fact that the model has seven unbounded sorts: process id's, decision indices, decision values, epochs, stakes, vote sets and process sets. Typically each invariant (including the one to be proved) requires one or two prophecy variables of each sort to refine the abstraction (though some of these may not be unique).

An additional complication is dealing with sets and majorities. Sets of processes are represented by an abstract data type. This type provides a predicate called 'majority' that indicates that a set contains more than half of the process id's. A function 'common' returns a common element between two sets if both are majorities (and is otherwise undefined). For example, to prove that we cannot have two conflicting decisions, we use the majorities that voted for each decision and declare the common process between these majorities as a prophecy variable. It then suffices to show that this particular process cannot have voted for both decisions (which requires the auxiliary invariants above). Since majorities are used in several places in the protocol, this tactic is applied several times.

Because of the larger number of prophecy variables, our (admittedly arbitrary) measure of overall proof complexity does not show as much advantage for model checking in this protocol as it does for the cache protocols. In fact, getting the details right in this proof was much more difficult subjectively than for FLASH.

This difficulty may be related to the two sorts in the model that are totally ordered: epochs and stakes. For these sorts we use the schemata for totally ordered sets detailed in Sect. 4. The ordering of these sorts introduces some difficulty in the proof, requiring more detailed invariants. For example, suppose we want to show that the first invariant above holds at the moment when a given process leaves one epoch and enters the next. The votes received at the epoch depend on all the previous epochs. We cannot however, make all of the unboundedly many lesser epochs concrete by adding a finite number of prophecy variables. This means our property must be inductive over epochs, that is, it holds now if it held in the past at the start of some *particular* epoch we can identify (perhaps the previous one). The need to write invariants that are inductive over ordered datatypes may account for the fact that the VS-Paxos invariant is more complex than that of the more complex FLASH protocol.

**Discussion.** We can make several general observations about these case studies. First, the performance of the finite-state model checker was never problematic. It always produced results in a reasonable amount of time and was not the bottleneck in constructing any of the proofs. Rather the most time-consuming task was usually analyzing the abstract counterexamples. This task proved tractable in practice, allowing the proof search process to converge.

Second, the invariants used in the model checking approach are generally much smaller than the manual ones because of the model checker's ability to infer state invariants.

This advantage may be somewhat offset by the need to provide prophecy variables to refine the abstraction, especially in the case where there are many unbounded sorts. Moreover, the need to write properties that are inductive over ordered sorts may lessen the advantage of model checking in invariant complexity. This was evident in the case of VS-Paxos and to some extent in Tomasulo as well, because of the implicit induction over the instruction stream. These criteria may be helpful in deciding which approach to take to a given proof problem.

Finally, it is interesting to note that the schemata presented in Sect. 4 proved adequate in all cases. That is, in no case was it necessary to add a schema to refine the abstraction of the transition relation. This indicates there is no need in practice to restrict to decidable logics or pay the cost of computing best transformers.

#### **7 Conclusion**

We have presented a method of abstracting parameterized or infinite-state SMC problems to finite-state problems based on propositional skeletons and eager theory explication. The method is extensible in the sense that users can add abstractions (or refine existing abstractions) by providing axiom schemata. It generalizes the 'datatype reduction' approach of [18] while giving both a simpler theoretical account and allowing a simpler implementation. Compared to predicate abstraction, it has the advantage that it can be applied to undecidable logics and does not require a costly decision procedure in the loop. The approach has been implemented in the IVy tool. Based on some case studies, we found that the approach is practical and requires substantially less complex auxiliary invariants than inductive invariant checking. We identified some conditions under which the approach is likely to be most effective.

Conceivably some of the tasks performed here by a human could be automated. However, the resulting system would be liable to fail unpredictably and opaquely. The present approach is an attempt to create a usable trade-off between human input and reliability.

The next step is to implement liveness. Recent work has constructed liveness proofs in IVy by an infinite-state liveness-to-safety reduction, but the proofs are complex [21]. It would interesting to compare this to an approach that leverages a finite-state model checker's ability to prove liveness.

#### **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

# Program Analysis Using Polyhedra

### **Fast Numerical Program Analysis with Reinforcement Learning**

Gagandeep Singh(B) , Markus P¨uschel, and Martin Vechev

Department of Computer Science, ETH Z¨urich, Z¨urich, Switzerland {gsingh,pueschel,martin.vechev}@inf.ethz.ch

**Abstract.** We show how to leverage reinforcement learning (RL) in order to speed up static program analysis. The key insight is to establish a correspondence between concepts in RL and those in analysis: a state in RL maps to an abstract program state in analysis, an action maps to an abstract transformer, and at every state, we have a set of sound transformers (actions) that represent different trade-offs between precision and performance. At each iteration, the agent (analysis) uses a policy learned offline by RL to decide on the transformer which minimizes loss of precision at fixpoint while improving analysis performance. Our approach leverages the idea of online decomposition (applicable to popular numerical abstract domains) to define a space of new approximate transformers with varying degrees of precision and performance. Using a suitably designed set of features that capture key properties of abstract program states and available actions, we then apply Q-learning with linear function approximation to compute an optimized context-sensitive policy that chooses transformers during analysis. We implemented our approach for the notoriously expensive Polyhedra domain and evaluated it on a set of Linux device drivers that are expensive to analyze. The results show that our approach can yield massive speedups of up to two orders of magnitude while maintaining precision at fixpoint.

#### **1 Introduction**

Static analyzers that scale to real-world programs yet maintain high precision are difficult to design. Recent approaches to attacking this problem have focused on two complementary methods. On one hand is work that designs clever algorithms that exploits the special structure of particular abstract domains to speed up analysis [5,10,15,16,20,21]. These works tackle specific types of analyses but the gains in performance can be substantial. On the other hand are approaches that introduce creative mechanisms to trade off precision loss for gains in speed [9,12, 18,19]. While promising, these methods typically do not take into account the particular abstract states arising during analysis which determine the precision of abstract transformers (e.g., join), resulting in suboptimal analysis precision or performance. A key challenge then is coming up with effective and general

c The Author(s) 2018

approaches that can decide where and how to lose precision *during analysis* for best tradeoff between performance and precision.

**Our Work.** We address the above challenge by offering a new approach for dynamically losing precision based on reinforcement learning (RL) [24]. The key idea is to learn a policy that determines when and how the analyzer should lose the least precision at an abstract state to achieve best performance gains. Towards that, we establish a correspondence between concepts in static analysis and RL, which demonstrates that RL is a viable approach for handling choices in the inner workings of a static analyzer.

To illustrate the basic idea, imagine that a static analyzer has at each program state two available abstract transformers: the precise but slow <sup>T</sup>p and the fast but less precise <sup>T</sup>f . Ideally, the analyzer would decide adaptively at each step on the best choice that maximizes speed while producing a final result of sufficient precision. Such a policy is difficult to craft by hand and hence we propose to leverage RL to discover the policy automatically.

To explain the connection with RL intuitively, we think of abstract states and transformers as analogous to states of a Go board and moves made by the Go player, respectively. In Go, the goal is to learn a policy that at each state decides on the next player action (transformer to use) which maximizes the chances of eventually winning the game (obtaining a precise fixpoint while improving performance in our case). Note that the reward to be maximized in Go is long-term and not an immediate gain in position, which is similar to iterative static analysis. To learn the policy with RL, one typically extracts a set of features φ from a given state and action, and uses those features to define a so-called Q-function, which is then learned, determining the desired policy.

In the example above, a learned policy would determine at each step whether to choose action <sup>T</sup>p or <sup>T</sup>f . To do that, for a given state and action, the analyzer computes the value of the Q-function using the features φ. Querying the Qfunction returns the suggested action from that state. Eventually, such a policy would ideally lead to a fixpoint of sufficient precision but be computed quicker.

While the overall connection between static analysis and reinforcement learning is conceptually clean, the details of making it work in practice pose significant challenges. The first is the design of suitable approximations to actually be able to gain performance when precision is lost. The second is the design of features φ that are cheap to compute yet expressive enough to capture key properties of abstract states. Finally, a suitable reward function combining both precision and performance is needed. We show how to solve these challenges for Polyhedra analysis.

**Main Contributions.** Our main contributions are:

– A space of sound, approximate Polyhedra transformers spanning different precision/performance trade-offs. The new transformers combine online decomposition with different constraint removal and merge strategies for approximations (Sect. 3).


We believe the reinforcement learning based approach outlined in this work can be applied to speed up other program analyzers (beyond Polyhedra).

#### **2 Reinforcement Learning for Static Analysis**

In this section we first introduce the general framework of reinforcement learning and then discuss its instantiation for static analysis.

#### **2.1 Reinforcement Learning**

Reinforcement learning (RL) [24] involves an *agent* learning to achieve a goal by interacting with its *environment*. The agent starts from an initial representation of its environment in the form of an initial state <sup>s</sup><sup>0</sup> ∈ S where <sup>S</sup> is the set of possible states. Then, at each time step t = 0, <sup>1</sup>, <sup>2</sup>,... , the agent performs an action <sup>a</sup>t ∈ A in state <sup>s</sup>t (<sup>A</sup> is the set of possible actions) and moves to the next state <sup>s</sup>t+1. The agent receives a numerical reward <sup>r</sup>(st, at, st+1) <sup>∈</sup> <sup>R</sup> for moving from the state <sup>s</sup>t to <sup>s</sup>t+1 through action <sup>a</sup>t. The agent repeats this process until it reaches a final state. Each sequence of states and actions from an initial state to the final state is called an *episode*.

In RL, state transitions typically satisfy the Markov property: the next state <sup>s</sup>t+1 depends only on the current state <sup>s</sup>t and the action <sup>a</sup>t taken from <sup>s</sup>t. A *policy* <sup>p</sup>: S→A is a mapping from states to actions: it specifies the action <sup>a</sup>t <sup>=</sup> <sup>p</sup>(st) that the agent will take when in state <sup>s</sup>t. The agent's goal is to learn a policy that maximizes not an immediate but a cumulative reward for its actions in the long term. The agent does this by selecting the action with the highest expected longterm reward in a given state. The quality function (Q-function) Q: S×A→ <sup>R</sup> specifies the long term cumulative reward associated with choosing an action <sup>a</sup>t in state <sup>s</sup>t. Learning this function, which is not available a priori, is essential for determining the best policy and is explained next.


#### **Algorithm 1.** Q-learning algorithm

**Q-learning and Approximating the Q-function.** Q-learning [25] can be used to learn the Q-function over state-action pairs. Typically the size of the state space is so large that it is not feasible to explicitly compute the Q-function for each state-action pair and thus the function is approximated. In this paper, we consider a *linear* function approximation of the Q-function for three reasons: (i) *effectiveness*: the approach is efficient, can handle large state spaces, and works well in practice [6]; (ii) *it leverages our application domain*: in our setting, it is possible to choose meaningful features (e.g., approximation of volume and cost of transformer) that relate to precision and performance of the static analysis and thus it is not necessary to uncover them automatically (as done, e.g., by training a neural net); and (iii) *interpretability of policy*: once the Q-function and associated policy are learned they can be inspected and interpreted.

The Q-function is described as a linear combination of basis functions <sup>φ</sup>i : S×A → <sup>R</sup>, <sup>i</sup> = 1,...,. Each <sup>φ</sup>i is a feature that assigns a value to a (state, action) pair and is the total number of chosen features. The choice of features is important and depends on the application domain. We collect the feature functions into a vector <sup>φ</sup>(s, a)=(φ<sup>1</sup>(s, a), φ<sup>2</sup>(s, a),...,φ-(s, a)); doing so, the Q-function has the form:

$$Q(s, a) = \sum\_{j=1}^{\ell} \theta\_j \cdot \phi\_j(s, a) = \phi(s, a) \cdot \theta^T,\tag{1}$$

where <sup>θ</sup> = (θ<sup>1</sup>, θ<sup>2</sup>,...,θ-) is the parameter vector. The goal of Q-learning with linear function approximation is thus to estimate (learn) θ.

Algorithm <sup>1</sup> shows the Q-learning procedure. In the algorithm, 0 <sup>≤</sup> γ < <sup>1</sup> is the *discount factor* which represents the difference in importance between immediate and future rewards. γ = 0 makes the agent only consider immediate rewards while γ <sup>≈</sup> 1 gives more importance to future rewards. The parameter <sup>0</sup> < α <sup>≤</sup> 1 is the *learning rate* that determines the extent to which the newly acquired information overrides the old information. The algorithm first initializes <sup>θ</sup> randomly. Then, for each step <sup>t</sup> in an episode, the agent takes an action <sup>a</sup>t,


**Table 1.** Mapping of RL concepts to Static analysis concepts.

moves to the next state <sup>s</sup>t+1 and receives a reward <sup>r</sup>(st, at, st+1). Line 12 in the algorithm shows the equation for updating the parameters θ. Notice that Qlearning is an off-policy learning algorithm as the update in the equation assumes that the agent follows a greedy policy (from state <sup>s</sup>t+1) while the action (at) taken by the agent (in <sup>s</sup>t) need not be greedy.

Once the Q-function is learned, a policy p<sup>∗</sup> for maximizing the agent's cumulative reward is obtained as:

$$p^\*(s) = \mathbf{argmax}\_{a \in \mathcal{A}} Q(s, a). \tag{2}$$

In the application, <sup>p</sup><sup>∗</sup> is computed on the fly at each stage s by computing Q for each action a and choosing the one with maximal Q(s, a). Since the number of actions is typically small, this incurs little overhead.

#### **2.2 Instantiation of RL to Static Analysis**

We now discuss a general recipe for instantiating the RL framework described above to the domain of static analysis. The precise formal instantiation to the specific numerical (Polyhedra) analysis is provided later.

In Table 1, we show a mapping between RL and program analysis concepts. Here, the analyzer is the agent that observes its environment, which is the abstract program state (e.g., polyhedron) arising at every iteration of the analysis. In general, the number of possible abstract states can be very large (or infinite) and thus, to enable RL in this setting, we abstract the state through a set of features (Table 2). An example of a feature could be the number of bounded program variables or the volume of a polyhedron. The challenge is to define the features to be fast to evaluate, yet sufficiently representative so the policy derived through learning generalizes well to unseen abstract program states.

Further, at every abstract state, the analyzer should have the choice between different actions corresponding to different abstract transformers. The transformers should range from expensive and precise to cheap and approximate. The reward function r is thus composed of a measure of precision and speed and should encourage approximations that are both precise and fast.

The goal of our agent is to then learn an approximation policy that at each step selects an action that tries to minimize the loss of analysis precision at fixpoint, while gaining overall performance. Learning such a policy is typically done offline using a given dataset D of programs (discussed in evaluation). However, this is computationally challenging because the dataset D can contain many programs and each program will need to be analyzed many times over during training: even a single run of the analysis can contain many (e.g., thousands) calls to abstract transformers. Thus, a good heuristic may be a complicated function of the chosen features. Hence, to improve the efficiency of learning in practice, one would typically exercise the choice for multiple transformers/actions only at certain program points. A good choice, and one we employ, are join points, where the most expensive transformer in numerical domains usually occurs.

Another key challenge lies in defining a suitable space of transformers. As we will see later, we accomplish this by leveraging recent advances in online decomposition for numerical domains [20–22]. We show how to do that for the notoriously expensive Polyhedra analysis; however, the approach is easily extendable to other popular numerical domains, which all benefit from decomposition.

#### **3 Polyhedra Analysis and Approximate Transformers**

In this section we first provide brief background on polyhedra analysis and online decomposition, a recent technique to speed up analysis *without losing precision* and applicable to all popular numerical domains [22]. Then we leverage online decomposition to define a flexible approximation framework that *loses precision* in a way that directly translates into performance gains. This framework forms the basis for our RL approach discussed in Sect. 4.

#### **3.1 Polyhedra Analysis**

Let <sup>X</sup> <sup>=</sup> {x1, x2,...,xn} be the set of <sup>n</sup> (numerical) program variables where each variable <sup>x</sup>i <sup>∈</sup> <sup>Q</sup> takes a rational value. An abstract element <sup>P</sup> <sup>⊆</sup> <sup>Q</sup><sup>n</sup> in the Polyhedra domain is a conjunction of linear constraints <sup>n</sup> i=1 <sup>a</sup>ix<sup>i</sup> <sup>≤</sup> <sup>c</sup> between the program variables where <sup>a</sup>i <sup>∈</sup> <sup>Z</sup>, c <sup>∈</sup> <sup>Q</sup>. This is called the *constraint* representation of the polyhedron.

**Constraints and Generator Representation.** For efficiency, it is common to maintain besides the constraint representations also the *generator* representation, which encodes a polyhedron as the convex hull of a finite set of vertices, rays, and lines. Rays and lines are represented by their direction. Thus, by abuse of prior notation we write <sup>P</sup> = (CP , <sup>G</sup>P ) where <sup>C</sup>P is the constraints representation (before just called P) and <sup>G</sup>P is the generator representation.

**Fig. 1.** Two representations of polyhedron P: As conjunction of 4 constraints <sup>C</sup>*P* , and as convex hull of 3 vertices and 2 rays <sup>G</sup>*P* .

**Example 1.** *Figure 1 shows an example of the two representations of an abstract element* <sup>P</sup> *in the Polyhedra domain.* <sup>C</sup>P *is the intersection of 4 linear constraints:*

<sup>C</sup>P <sup>=</sup> {−x<sup>1</sup> ≤ −2, <sup>−</sup>x<sup>2</sup> ≤ −2, x<sup>2</sup> <sup>≤</sup> <sup>10</sup>, <sup>3</sup>x<sup>2</sup> <sup>−</sup> <sup>5</sup>x<sup>1</sup> <sup>≤</sup> <sup>5</sup>}.

<sup>G</sup>P *is the convex hull of 3 vertices and 2 rays:*

<sup>G</sup>P <sup>=</sup> {vertices, rays, lines} <sup>=</sup> {{(2, 2),(2, 5),(5, 10)}, {(1, 0),(1, 0)}, ∅}.

*Notice that* <sup>G</sup>P *contains two rays in the same direction* (1, 0)*; thus one of them could be removed without changing the set of points in* P*.*

During analysis, the abstract elements are manipulated with abstract transformers that model the effect of statements and control flow in the program such as assignment, conditional, join, and others. Upon termination of the analysis, each program statement has an associated subsequent P containing all possible variable values after this statement. The main bottleneck for the Polyhedra analysis is the join transformer (), and thus it is the focus for our approximations.

Recently, Polyhedra domain analysis was sped up by orders of magnitude, without approximation, using the idea of online decomposition [21]. The basic idea is to dynamically decompose the occurring abstract elements into independent components (in essence abstract elements on smaller variable sets) based on the connectivity between variables in the constraints, and to maintain this (permanently changing) decomposition during analysis. The finer the decomposition, the faster the analysis.

Our approximation framework builds on online decomposition. The basic idea is simple: we approximate by dropping constraints to reduce connectivity among constraints and thus to yield finer decompositions of abstract elements. These directly translate into speedup. We consider various options of such approximation; reinforcement learning (in Sect. 4) will then learn a proper, context-sensitive strategy that stipulates when and which approximation option to apply.

Next, we provide brief background on the ingredients of online decomposition and explain our mechanisms for soundly approximating the join transformer.

#### **3.2 Online Decomposition**

Online decomposition is based on the observation that during analysis, the set of variables <sup>X</sup> in a given polyhedron <sup>P</sup> can be partitioned as <sup>π</sup>P <sup>=</sup> {X<sup>1</sup>,..., <sup>X</sup>r} into *blocks* <sup>X</sup>t, such that constraints exist only between variables in the same block. Each unconstrained variable <sup>x</sup>i ∈ X yields a singleton block {xi}. Using this partition, <sup>P</sup> can be decomposed into a set of smaller Polyhedra <sup>P</sup>(Xt) called *factors*. As a consequence, the abstract transformer can now be applied only on the small subset of factors relevant to the program statement, which translates into better performance.

**Example 2.** *Consider the set* <sup>X</sup> <sup>=</sup> {x<sup>1</sup>, x<sup>2</sup>, x<sup>3</sup>, x<sup>4</sup>, x<sup>5</sup>, x<sup>6</sup>} *and the polyhedron:*

$$P = \{2x\_1 - 3x\_2 + x\_3 + x\_4 \le 0, x\_5 = 0\}.$$

*Here,* <sup>π</sup>P <sup>=</sup> {{x<sup>1</sup>, x<sup>2</sup>, x<sup>3</sup>, x<sup>4</sup>}, {x<sup>5</sup>}, {x<sup>6</sup>}} *is a possible partition of* <sup>X</sup> *with factors*

$$P(\mathcal{X}\_1) = \{2x\_1 - 3x\_2 + x\_3 + x\_4 \le 0\}, \ P(\mathcal{X}\_2) = \{x\_5 = 0\}, \ P(\mathcal{X}\_3) = \emptyset.$$

The set of partitions of <sup>X</sup> forms a lattice with the ordering <sup>π</sup> <sup>π</sup> iff every block of π is a subset of a block of π . Upper and lower bound of two partitions π<sup>1</sup>, π<sup>2</sup>, i.e., <sup>π</sup><sup>1</sup> <sup>π</sup><sup>2</sup> and <sup>π</sup><sup>1</sup> <sup>π</sup><sup>2</sup> are defined accordingly.

The optimal (finest) partition for an element <sup>P</sup> is denoted with <sup>π</sup>P . Ideally, one would always determine and maintain this finest partition for each output Z of a transformer but it may be too expensive to compute. Thus, the online decomposition in [20,21] often computes a (cheaply computable) *permissible* partition <sup>π</sup>Z <sup>π</sup>Z. Note that making the output partition coarser (while keeping the same constraints) does not change the precision of the abstract transformer.

#### **3.3 Approximating the Polyhedra Join**

Let <sup>π</sup>com <sup>=</sup> <sup>π</sup>P<sup>1</sup> <sup>π</sup>P<sup>2</sup> be a common permissible partition for the inputs <sup>P</sup>1, P<sup>2</sup> of the join transformer. Then, from [21], a permissible partition for the (not approximated) output is obtained by keeping all blocks <sup>X</sup>t <sup>∈</sup> <sup>π</sup>com for which <sup>P</sup><sup>1</sup>(Xt) = <sup>P</sup><sup>2</sup>(Xt) in the output partition <sup>π</sup>Z, and fusing all remaining blocks into one. Formally, <sup>π</sup>Z <sup>=</sup> {N } ∪ U, where

$$\mathcal{N} = \bigcup \{ \mathcal{X}\_k \in \overline{\pi}\_{\text{com}} : P\_1(\mathcal{X}\_k) \neq P\_2(\mathcal{X}\_k) \}, \quad \mathcal{U} = \{ \mathcal{X}\_k \in \overline{\pi}\_{\text{com}} : P\_1(\mathcal{X}\_k) = P\_2(\mathcal{X}\_k) \}.$$

The join transformer computes the generators <sup>G</sup>Z for the output <sup>Z</sup> as <sup>G</sup>Z <sup>=</sup> <sup>G</sup>P1(X \N) <sup>×</sup> (GP1(N) ∪ GP2(N)) where <sup>×</sup> is the Cartesian product. The constraint representation <sup>C</sup>Z is computed as <sup>C</sup>Z <sup>=</sup> <sup>C</sup>P1(X \N)∪conversion(GP1(N)∪GP2(N)). The conversion algorithm has worst-case exponential complexity and is the most expensive step of the join. Note that the decomposed join applies it only on the generators <sup>G</sup>P1(N) ∪ GP2(N) corresponding to the block <sup>N</sup> .

The cost of the decomposed join transformer depends on the size of the block <sup>N</sup> . Thus, it is desirable to bound this size by a *threshold* <sup>∈</sup> <sup>N</sup>. Let <sup>B</sup> <sup>=</sup> {Xk <sup>∈</sup> <sup>π</sup>com : <sup>X</sup>k ∩ N <sup>=</sup> ∅} be the set of blocks that merge into <sup>N</sup> in the output <sup>π</sup>Z and <sup>B</sup>t <sup>=</sup> {Xk ∈ B : |Xk<sup>|</sup> <sup>&</sup>gt; *threshold*} be the set of blocks in <sup>B</sup> with size <sup>&</sup>gt; *threshold*.

**Splitting of Large Blocks.** For each block <sup>X</sup>t ∈ Bt, we apply the join on the associated factors: <sup>Z</sup>(Xt) = <sup>P</sup><sup>1</sup>(Xt) <sup>P</sup><sup>2</sup>(Xt). We then remove constraints from <sup>Z</sup>(Xt) until it decomposes into blocks of sizes <sup>≤</sup> *threshold*. Since we only remove constraints from <sup>Z</sup>(Xt), the resulting transformer remains sound. There are many choices for removing constraints as shown in the next example.

**Example 3.** *Consider the following polyhedron and threshold* = 4

$$\begin{aligned} \mathcal{X}\_t &= \{x\_1, x\_2, x\_3, x\_4, x\_5, x\_6\}, \\ Z(\mathcal{X}\_t) &= \{x\_1 - x\_2 + x\_3 \le 0, x\_2 + x\_3 + x\_4 \le 0, x\_2 + x\_3 \le 0, x\_3 \\ &x\_3 + x\_4 \le 0, x\_4 - x\_5 \le 0, x\_4 - x\_6 \le 0\}. \end{aligned}$$

*We can remove* <sup>M</sup> <sup>=</sup> {x<sup>4</sup> <sup>−</sup> <sup>x</sup><sup>5</sup> <sup>≤</sup> <sup>0</sup>, x<sup>4</sup> <sup>−</sup> <sup>x</sup><sup>6</sup> <sup>≤</sup> <sup>0</sup>} *from* <sup>Z</sup>(Xt) *to obtain the constraint set* {x<sup>1</sup> <sup>−</sup> <sup>x</sup><sup>2</sup> <sup>+</sup> <sup>x</sup><sup>3</sup> <sup>≤</sup> <sup>0</sup>, x<sup>2</sup> <sup>+</sup> <sup>x</sup><sup>3</sup> <sup>+</sup> <sup>x</sup><sup>4</sup> <sup>≤</sup> <sup>0</sup>, x<sup>2</sup> <sup>+</sup> <sup>x</sup><sup>3</sup> <sup>≤</sup> <sup>0</sup>, x<sup>3</sup> <sup>+</sup> <sup>x</sup><sup>4</sup> <sup>≤</sup> <sup>0</sup>} *with partition* {{x<sup>1</sup>, x<sup>2</sup>, x<sup>3</sup>, x<sup>4</sup>}, {x<sup>5</sup>}, {x<sup>6</sup>}}*, which obeys the threshold.*

*We could also remove* <sup>M</sup> <sup>=</sup> {x<sup>2</sup> <sup>+</sup> <sup>x</sup><sup>3</sup> <sup>+</sup> <sup>x</sup><sup>4</sup> <sup>≤</sup> <sup>0</sup>, x<sup>3</sup> <sup>+</sup> <sup>x</sup><sup>4</sup> <sup>≤</sup> <sup>0</sup>} *from* <sup>Z</sup>(Xt) *to get the constraint set* {x<sup>1</sup> <sup>−</sup> <sup>x</sup><sup>2</sup> <sup>+</sup> <sup>x</sup><sup>3</sup> <sup>≤</sup> <sup>0</sup>, x<sup>2</sup> <sup>+</sup> <sup>x</sup><sup>3</sup> <sup>≤</sup> <sup>0</sup>, x<sup>4</sup> <sup>−</sup> <sup>x</sup><sup>5</sup> <sup>≤</sup> <sup>0</sup>, x<sup>4</sup> <sup>−</sup> <sup>x</sup><sup>6</sup> <sup>≤</sup> <sup>0</sup>} *with partition* {{x<sup>1</sup>, x<sup>2</sup>, x<sup>3</sup>}, {x<sup>4</sup>, x<sup>5</sup>, x<sup>6</sup>}}*, which also obeys the threshold.*

We next discuss our choices for the constraint removal algorithm.

**Stoer-Wagner min-cut.** The first basic idea is to remove a minimal number of constraints in <sup>Z</sup>(Xt) that decomposes the block <sup>X</sup>t into two blocks. To do so, we associate with <sup>Z</sup>(Xt) a weighted undirected graph <sup>G</sup> = (V, <sup>E</sup>), where <sup>V</sup> <sup>=</sup> <sup>X</sup>t. Further, there is an edge between <sup>x</sup>i and <sup>x</sup>j , if there is a constraint containing both; its weight <sup>m</sup>ij is the number of such constraints. We then apply the standard Stoer-Wagner min-cut algorithm [23] to obtain a partition of <sup>X</sup>t into <sup>X</sup> t and <sup>X</sup> t . <sup>M</sup> collects all constraints that need to be removed, i.e., those that contain at least one variable from both X t and <sup>X</sup> t .

**Example 4.** *Figure <sup>2</sup> shows the graph* <sup>G</sup> *for* <sup>Z</sup>(Xt) *in Example 3. Applying the Stoer-Wagner min-cut on* <sup>G</sup> *once will cut off* <sup>x</sup><sup>5</sup> *or* <sup>x</sup><sup>6</sup> *by removing the constraint* <sup>x</sup><sup>4</sup>−x<sup>5</sup> *or* <sup>x</sup><sup>4</sup>−x6*, respectively. In either case a block of size 5 remains, exceeding the threshold of 4. After two applications, both constraints have been removed and the resulting block structure is given by* {{x1, x2, x3, x<sup>4</sup>}, {x<sup>5</sup>}, {x<sup>6</sup>}}*. The associated factors are* {x<sup>1</sup>−x<sup>2</sup>+x<sup>3</sup> <sup>≤</sup> <sup>0</sup>, x<sup>2</sup>+x<sup>3</sup>+x<sup>4</sup> <sup>≤</sup> <sup>0</sup>, x<sup>2</sup>+x<sup>3</sup> <sup>≤</sup> <sup>0</sup>, x<sup>3</sup>+x<sup>4</sup> <sup>≤</sup> <sup>0</sup>} *and* <sup>x</sup>5, x<sup>6</sup> *become unconstrained.*

**Weighted Constraint Removal.** Our second approach for constraints removal does not associate weights with edges but with constraints. It then removes greedily edges with high weights. Specifically, we consider the following two choices of constraint weights, yielding two different constraint removal policies:

**Fig. 2.** Graph <sup>G</sup> for <sup>Z</sup>(X*t*) in Example <sup>3</sup>


Once the weights are computed, we remove the constraint with maximum weight. The intuition is that variables in this constraint most likely occur in other constraints in <sup>Z</sup>(Xt) and thus they do not become unconstrained upon constraint removal. This reduces the loss of information.

**Example 5.** *Applying the first definition of weights in Example 3, we get* <sup>n</sup><sup>1</sup> <sup>=</sup> <sup>1</sup>, n<sup>2</sup> = 3, n<sup>3</sup> = 4, n<sup>4</sup> = 4, n<sup>5</sup> = 1, n<sup>6</sup> = 1*. The constraint* <sup>x</sup><sup>2</sup> <sup>+</sup>x<sup>3</sup> <sup>+</sup>x<sup>4</sup> <sup>≤</sup> <sup>0</sup> *has the maximum weight of* <sup>n</sup><sup>2</sup> <sup>+</sup>n<sup>3</sup> <sup>+</sup>n<sup>4</sup> = 11 *and thus is chosen for removal. Removing this constraint from* <sup>Z</sup>(Xt) *does not yet yield a decomposition; thus we have to repeat. Doing so* {x<sup>3</sup> <sup>+</sup> <sup>x</sup><sup>4</sup> <sup>≤</sup> <sup>0</sup>} *is chosen. Now,* <sup>Z</sup>(Xt) \ M <sup>=</sup> {x<sup>1</sup> <sup>−</sup> <sup>x</sup><sup>2</sup> <sup>+</sup> <sup>x</sup><sup>3</sup> <sup>≤</sup> <sup>0</sup>, x<sup>2</sup>+x<sup>3</sup> <sup>≤</sup> <sup>0</sup>, x<sup>4</sup>−x<sup>5</sup> <sup>≤</sup> <sup>0</sup>, x<sup>4</sup>−x<sup>6</sup> <sup>≤</sup> <sup>0</sup>} *which can be decomposed into two factors* {x<sup>1</sup> <sup>−</sup> <sup>x</sup><sup>2</sup> <sup>+</sup> <sup>x</sup><sup>3</sup> <sup>≤</sup> <sup>0</sup>, x<sup>2</sup> <sup>+</sup> <sup>x</sup><sup>3</sup> <sup>≤</sup> <sup>0</sup>} *and* {x<sup>4</sup> <sup>−</sup> <sup>x</sup><sup>5</sup> <sup>≤</sup> <sup>0</sup>, x<sup>4</sup> <sup>−</sup> <sup>x</sup><sup>6</sup> <sup>≤</sup> <sup>0</sup>} *corresponding to blocks* {x1, x2, x<sup>3</sup>} *and* {x4, x5, x<sup>6</sup>}*, respectively, each of size* <sup>≤</sup> *threshold .*

**Merging Blocks.** The sizes of all blocks in B\Bt are <sup>≤</sup> *threshold* and we can apply merging to obtain larger blocks <sup>X</sup>m <sup>≤</sup> *threshold* to increase the precision of the subsequent join. The join is then applied on the factors <sup>P</sup><sup>1</sup>(Xm), P<sup>2</sup>(Xm) and the result is added to the output Z. We consider the following three merging strategies. To simplify the explanation, we assume that the blocks in B\Bt are ordered by ascending size:


**Example 6.** *Consider threshold* = 5 *and* B\Bt *with block sizes* {1, <sup>1</sup>, <sup>2</sup>, <sup>2</sup>, <sup>2</sup>, <sup>2</sup>, <sup>3</sup>, <sup>5</sup>, <sup>7</sup>, <sup>10</sup>}*. Merging smallest first yields blocks* 1+1+2*,* 2+2*,* 2+3 *leaving the rest unchanged. The resulting sizes are* {4, <sup>4</sup>, <sup>5</sup>, <sup>5</sup>, <sup>7</sup>, <sup>10</sup>}*. Merging large with small leaves* <sup>10</sup>, <sup>7</sup>, <sup>5</sup> *unchanged and merges* 3+1+1*,* 2+2*, and* 2+2*. The resulting sizes are also* {4, <sup>4</sup>, <sup>5</sup>, <sup>5</sup>, <sup>7</sup>, <sup>10</sup>} *but the associated factors are different (since different blocks are merged), which will yield different results in following transformations.*

**Need for RL.** Algorithm 2 shows how to approximate the join transformer. Different choices of threshold, splitting, and merge strategies yield a range of transformers with different performance and precision depending on the inputs. All of the transformers are non-monotonic, however the analysis always converges to a fixpoint when combined with widening [2]. Determining the suitability of a given choice on an input is highly non-trivial and thus we use RL to learn it.


**Table 2.** Features for describing RL state s (m ∈ {1, 2}, 0 ≤ j ≤ 8, 0 ≤ h ≤ 3).


#### **4 Reinforcement Learning for Polyhedra Analysis**

We now describe how to instantiate reinforcement learning for approximating Polyhedra domain analysis. The instantiation consists of the following steps:


**States.** We consider nine features for defining a state s for RL. The features <sup>ψ</sup>i, their extraction complexity and their typical range on our benchmarks are shown in Table 2. The first seven features capture the asymptotic complexity of the join [21] on the input polyhedra <sup>P</sup><sup>1</sup> and <sup>P</sup><sup>2</sup>. These are the number of blocks, the distribution (using maximum, minimum and average) of their sizes, and the number of generators. The precision of the inputs is captured by considering the number of variables <sup>x</sup>i ∈ X with finite upper and lower bound, and the number of those with only a finite upper or lower bound in both <sup>P</sup><sup>1</sup> and <sup>P</sup><sup>2</sup>.

As shown in Table 2, each state feature <sup>ψ</sup>i returns a natural number, however, its range can be rather large, resulting in a massive state space. To ensure scalability and generalization of learning, we use bucketing to reduce the state space size by clustering states with similar precision and expected join cost. The number <sup>n</sup>i of buckets for each <sup>ψ</sup>i and their definition are shown in the last two columns of Table 2. Using bucketing, the RL state s is then a 9-tuple consisting of the indices of buckets where each index indicates the bucket that <sup>ψ</sup>i's return value falls into.

**Actions.** An action a is a 3-tuple (th, r algo, m algo) consisting of:


All three of these have been discussed in detail in Sect. 3. The *threshold* values were chosen based on performance characterization on our benchmarks. With the above, we have 36 possible actions per state.

**Reward.** After applying the (approximated join transformer) according to action <sup>a</sup>t in state <sup>s</sup>t, we compute the precision of the output polyhedron <sup>P</sup><sup>1</sup> P<sup>2</sup> by first computing the smallest (often unbounded) box<sup>1</sup> covering <sup>P</sup><sup>1</sup> <sup>P</sup><sup>2</sup> which has complexity O(ng). We then compute the following quantities from this box:


Further, we measure the runtime in CPU cycles *cyc* for the approximate join transformer. The reward is then defined by

$$r(s\_t, a\_t, s\_{t+1}) = 3 \cdot n\_s + 2n\_b + n\_{hb} - \log\_{10}(cyc). \tag{3}$$

As the order of precision for different types of intervals is: singleton > bounded > half bounded interval, the reward function in (3) weighs their numbers by 3, <sup>2</sup>, 1. The reward function in (3) favors both high performance and

<sup>1</sup> A natural measure of precision is the volume of <sup>P</sup><sup>1</sup> <sup>P</sup>2. However, calculating it is very expensive and P<sup>1</sup> P<sup>2</sup> is often unbounded.


**Table 3.** Instantiation of Q-learning to Polyhedra static analysis.

precision. It also ensures that the precision part (3 ·ns + 2nb <sup>+</sup>nhb) has a similar magnitude range as the performance part (log10(cyc))<sup>2</sup>.

**Q-function.** As mentioned before, we approximate the Q-function by a linear function (1). We define binary feature functions <sup>φ</sup>ijk for each (state, action) pair. <sup>φ</sup>ijk(s, a) = 1 if the tuple <sup>s</sup>(i) lies in <sup>j</sup>-th bucket and action <sup>a</sup> <sup>=</sup> <sup>a</sup>k

$$\phi\_{ijk}(s,a) = 1 \iff s(i) = j \text{ and } a = a\_k \tag{4}$$

The Q-function is a linear combination of state action features <sup>φ</sup>ijk

$$Q(s,a) = \sum\_{i=1}^{9} \sum\_{j=1}^{n\_i} \sum\_{k=1}^{36} \theta\_{ijk} \cdot \phi\_{ijk}(s,a). \tag{5}$$

**Q-learning.** During the training phase, we are given a dataset of programs D and we use Q-LEARN from Algorithm 1 on each program in D to perform Q-learning. Q-learning is performed with input parameters instantiated as explained above and summarized in Table 3. Each episode consists of a run of Polyhedra analysis on a benchmark in D. We run the analysis multiple times on each program in D and update the Q-function after each join by calling Q-LEARN.

A Q-function is typically learned using an -greedy policy [24] where the agent takes greedy actions by exploiting the current Q-estimates while also exploring randomly. The policy requires initial random exploration to learn good Q-estimates that can be later exploited. This is infeasible for the Polyhedra analysis as a typical episode contains thousands of join calls. Therefore, we generate actions for Q-learning by exploiting the optimal policy for precision (which always selects the precise join) and explore performance by choosing a random approximate join: both with a probability of 0.5<sup>3</sup>.

<sup>2</sup> The log is used since the join has exponential complexity.

<sup>3</sup> We also tried exploitation probabilities of 0.7 and 0.9, however the resulting policies had suboptimal performance during testing due to limited exploration.

Formally, the action <sup>a</sup>t := <sup>p</sup>(st) selected in state <sup>s</sup>t during learning is given by <sup>a</sup>t = (th, r algo, m algo) where

$$th = \begin{cases} \mathbf{rand}(\) \nmid 4 + 1 \text{ with probability } 0.5\\ \mathbf{min}(4, (\sum\_{i=1}^{|\mathcal{B}|} |\mathcal{X}\_{k}|)/5) \text{ with probability } 0.5\\ r.algo = \mathbf{rand}() \nmid 3 + 1, m.algo = \mathbf{rand}() \nmid 3 + 1. \end{cases} \tag{6}$$

**Obtaining the Learned Policy.** After learning over the dataset D, the learned approximating join transformer in state <sup>s</sup>t chooses an action according to (2) by selecting the maximal value over all actions. The value of th = 1, <sup>2</sup>, <sup>3</sup>, 4 is decoded as *threshold* = 5, <sup>10</sup>, <sup>15</sup>, 20 respectively.

#### **5 Experimental Evaluation**

We implemented our approach in the form of a C-library for Polyhedra analysis, called Poly-RL. We compare the performance and precision of Poly-RL against the state-of-the-art ELINA [1], which uses online decomposition for Polyhedra analysis without losing precision. In addition, we implemented two Polyhedra analysis approximations (baselines) based on the following heuristics:


All Polyhedra implementations use 64-bit integers to encode rational numbers. In the case of overflow, the corresponding polyhedron is set to top.

**Experimental Setup.** All our experiments including learning the parameters θ for the Q-function and the evaluation of the learned policy on unseen benchmarks were carried out on a 2.13 GHz Intel Xeon E7- 4830 Haswell CPU with 24 MB L3 cache and 256 GB memory. All Polyhedra implementations were compiled with gcc 5.4.0 using the flags -O3 -m64 -march=native.

**Analyzer.** For both learning and evaluation, we used the *crab-llvm* analyzer for C-programs, part of the larger SeaHorn [7] verification framework. The analyzer performs intra-procedural analysis of llvm-bitcode to generate Polyhedra invariants which can be used for verifying assertions using an SMT solver [11].

**Benchmarks.** SVCOMP [3] contains thousands of challenging benchmarks in different categories suited for different kinds of analysis. We chose the Linux Device Drivers (LD) category, known to be challenging for Polyhedra analysis [21] as to prove properties in these programs one requires Polyhedra invariants (and not say Octagon invariants which are weaker).

**Training Dataset.** We chose 70 large benchmarks for Q-learning. We ran each benchmark a thousand times over a period of three days to generate sample traces of Polyhedra analysis containing thousands of calls to the join transformer. We set a timeout of 5 minutes per run and discarded incomplete traces in case of a timeout. In total, we performed Q-learning over 110811 traces.

**Evaluation Method.** For evaluating the effectiveness of our learned policy, we then chose benchmarks based on the following criteria:


Based on these criteria, we found 11 benchmarks on which we present our results. We used a timeout of 1 h and memory limit of 100 GB for our experiments.

**Inspecting the Learned Policy.** Our learned policy chooses in the majority of cases *threshold*=20, the binary weighted constraint removal algorithm for splitting, and the merge smallest first algorithm for merging. Poly-Fixed always uses these values for defining an approximate transformer, i.e., it follows a fixed strategy. Our experimental results show that following this fixed strategy results in suboptimal performance compared to our learned policy that makes adaptive, context-sensitive decisions to improve performance.

**Results.** We measure the precision as a fraction of program points at which the Polyhedra invariants generated by approximate analysis are semantically the same or stronger than the ones generated by ELINA. This is a less biased and more challenging measure than the number of discharged assertions [4,18,19] where one can write weak assertions that even a weaker domain can prove.

Table 4 shows the number of program points<sup>4</sup>, timings (in seconds), and the precision (in %) of Poly-RL, Poly-Fixed, and Poly-Init w.r.t. ELINA on all 11 benchmarks. In the table, the entry TO (MO) means that the analysis did not finish within 1 h (exceeded the memory limit). For an incomplete analysis, we compute the precision by comparing program points for which the incomplete analysis can produce invariants.

**Poly-RL vs ELINA.** In Table 4, Poly-RL obtains > 7x speed-up over ELINA on 6 of the 11 benchmarks with a maximum of 515x speedup for the mfd sm501 benchmark. It also obtains the same or stronger invariants on ≥ 87% of program

<sup>4</sup> The benchmarks contain up to 50K LOC but SeaHorn encodes each basic block as one program point, thus the number of points in Table 4 is significantly reduced.


**Table 4.** Timings (seconds) and precision of approximations (%) w.r.t. ELINA.

points on 8 benchmarks. Note that Poly-RL obtains both large speedups and the same invariants at all program points on 3 benchmarks.

The widening transformer removes many constraints produced by the precise join transformer from ELINA which allows Poly-RL to obtain the same invariants as ELINA despite the loss of precision during join in most cases. Poly-RL produces large number of non-comparable fixpoints on 3 benchmarks in Table 4 due to non-monotonic join transformers.

We also tested Poly-RL on 17 benchmarks from the product lines category. ELINA did not finish within an hour on any of these benchmarks whereas Poly-RL finished within 1 s. Poly-RL had 100% precision on the subset of program points at which ELINA produces invariants. With Poly-RL, SeaHorn successfully discharged the assertions. We did not include these results in Table 4 as the precision w.r.t. ELINA cannot be completely compared.

**Poly-RL vs Poly-Fixed.** Poly-Fixed is never significantly more precise than Poly-RL in Table 4. Poly-Fixed is faster than Poly-RL on 4 benchmarks, however the speedups are small. Poly-Fixed is slower than ELINA on 3 benchmarks and times out on 2 of these. This is due to the overhead of the binary weight constraints removal algorithm and the exponential number of generators in the output.

**Poly-RL vs Poly-Init.** From (6), Poly-Init takes random actions and thus the quality of its result varies depending on the run. Table 4 shows the results on a sample run. Poly-RL is more precise than Poly-Init on all benchmarks in Table 4. Poly-Init also does not finish on 4 benchmarks.

#### **6 Related Work**

Our work can be seen as part of the general research direction on parametric program analysis [4,9,14,18,19], where one tunes the precision and cost of the analysis by adapting it to the analyzed program. The main difference is that prior approaches fix the learning parameters for a given program while our method is adaptive and can select parameters dynamically based on the abstract states encountered during analysis, yielding better cost/precision tradeoffs. Further, prior work measures precision by the number of assertions proved whereas we target the stronger notion of fixpoint equivalence.

The work of [20,21] improve the performance of Octagon and Polyhedra domain analysis respectively based on online decomposition without losing precision. We compared against [21] in this paper. As our results suggest, the performance of Polyhedra analysis can be significantly improved with RL. We believe that our approach can be easily extended to the Octagon domain for achieving speedups over the work of [20] as the idea of online decomposition applies to all sub-polyhedra domains [22].

Reinforcement learning based on linear function approximation of the Qfunction has been applied to learn branching rules for SAT solvers in [13]. The learned policies achieve performance similar to those of the best branching rules. We believe that more powerful techniques for RL such as deep Q-networks (DQN) [17] or double Q-learning [8] can be investigated to potentially improve the quality of results produced by our approach.

#### **7 Conclusion**

Polyhedra analysis is notoriously expensive and has worst-case exponential complexity. We showed how to gain significant speedups by adaptively trading precision for performance during analysis, using an automatically learned policy. Two key insights underlie our approach. First, we identify reinforcement learning as a conceptual match to the learning problem at hand: deciding which transformers to select at each analysis step so to achieve the eventual goal of high precision and fast convergence to fixpoint. Second, we build on the concept of online decomposition, and offer an effective method to directly translate precision loss into significant speed-ups. Our work focused on polyhedra analysis for which we provide a complete implementation and evaluation. We believe the approach can be instantiated to other forms of static analysis in future work.

**Acknowledgments.** We would like to thank Afra Amini for her help in implementing the approximate transformers. We would also like to thank the anonymous reviewers for their constructive feedback. This research was supported by the Swiss National Science Foundation (SNF) grant number 163117.

### **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

### **A Direct Encoding for NNC Polyhedra**

Anna Becchi and Enea Zaffanella(B)

Department of Mathematical, Physical and Computer Sciences, University of Parma, Parma, Italy anna.becchi@studenti.unipr.it, enea.zaffanella@unipr.it

**Abstract.** We present an alternative Double Description representation for the domain of NNC (not necessarily closed) polyhedra, together with the corresponding Chernikova-like conversion procedure. The representation uses no slack variable at all and provides a solution to a few technical issues caused by the encoding of an NNC polyhedron as a closed polyhedron in a higher dimension space. A preliminary experimental evaluation shows that the new conversion algorithm is able to achieve significant efficiency improvements.

#### **1 Introduction**

The Double Description (DD) method [28] allows for the representation and manipulation of convex polyhedra by using two different geometric representations: one based on a finite collection of *constraints*, the other based on a finite collection of *generators*. Starting from any one of these representations, the other can be derived by application of a conversion procedure [10–12], thereby obtaining a DD pair. The procedure is incremental, capitalizing on the work already done when new constraints and/or generators need to be added to an input DD pair.

The DD method lies at the foundation of many software libraries and tools<sup>1</sup> which are used, either directly or indirectly, in research fields as diverse as bioinformatics [31,32], computational geometry [1,2], analysis of analog and hybrid systems [8,18,22,23], automatic parallelization [6,29], scheduling [16], static analysis of software [4,13,15,17,21,24].

In the classical setting, the DD method is meant to compute geometric representations for *topologically closed* polyhedra in an n-dimensional vector space. However, there are applications requiring the ability to also deal with linear *strict* inequality constraints, leading to the definition of *not necessarily closed* (NNC) polyhedra. For example, this is the case for some of the analysis tools developed for the verification of hybrid systems [8,18,22,23], static analysis tools such as Pagai [24], and tools for the automatic discovery of ranking functions [13].

The few DD method implementations providing support for NNC polyhedra (Apron and PPL) are all based on an *indirect* representation. The approach, proposed in [22,23] and studied in more detail in [3,5], encodes the strict inequality

<sup>1</sup> An incomplete list of available implementations includes cdd [19], PolyLib [27], Apron [25], PPL [4], 4ti2 [1], Skeleton [33], Addibit [20], ELINA [30].

c The Author(s) 2018

H. Chockler and G. Weissenbacher (Eds.): CAV 2018, LNCS 10981, pp. 230–248, 2018. https://doi.org/10.1007/978-3-319-96145-3\_13

constraints by means of an additional space dimension, playing the role of a *slack variable*; the new space dimension, usually denoted as -, needs to be nonnegative and bounded from above, i.e., the constraints 0 <sup>≤</sup> - ≤ 1 are added to the topologically closed representation <sup>R</sup> (called --representation) of the NNC polyhedron P. The main advantage of this approach is the possibility of reusing, almost unchanged, all of the well-studied algorithms and optimizations that have been developed for the classical case of closed polyhedra. However, the addition of a slack variable carries with itself a few technical issues.


In this paper, we pursue a different approach for the handling of NNC polyhedra in the DD method. Namely, we specify a *direct* representation, dispensing with the need of the slack variable. The main insight of this new approach is the separation of the (constraints or generators) geometric representation into two components, the skeleton and the non-skeleton of the representation, playing quite different roles: while keeping a geometric encoding for the skeleton component, we will adopt a combinatorial encoding for the non-skeleton one. For this new representation, we propose the corresponding variant of the Chernikova's conversion procedure, where both components are handled by respective processing phases, so as to take advantage of their peculiarities. In particular, we develop *ad hoc* functions and procedures for the combinatorial non-skeleton part.

The new representation and conversion procedure, in principle, can be integrated into any of the available implementations of the DD method. Our experimental evaluation is conducted in the context of the PPL and shows that the new algorithm, while computing the correct results for all of the considered tests, achieves impressive efficiency improvements with respect to the implementation based on the slack variable.

The paper is structured as follows. Section 2 briefly introduces the required notation, terminology and background concepts. Section 3 proposes the new representation for NNC polyhedra; the proofs of the stated results are in [7]. The extension of the Chernikova's conversion algorithm to this new representation is presented in Sect. 4. Section 5 reports the results obtained by the experimental evaluation. We conclude in Sect. 6.

#### **2 Preliminaries**

We assume some familiarity with the basic notions of lattice theory [9]. For a lattice L, , <sup>⊥</sup>, ,, , an element a <sup>∈</sup> L is an *atom* if <sup>⊥</sup> a and there exists no element b <sup>∈</sup> L such that <sup>⊥</sup> b a. For S <sup>⊆</sup> L, the *upward closure* of S is defined as <sup>↑</sup> S def <sup>=</sup> { x <sup>∈</sup> L | ∃s <sup>∈</sup> S.s x }. The set S <sup>⊆</sup> L is *upward closed* if S <sup>=</sup> <sup>↑</sup> S; we denote by ℘<sup>↑</sup>(L) the set of all the upward closed subsets of <sup>L</sup>. For x <sup>∈</sup> L, <sup>↑</sup> x is a shorthand for ↑{x}. The notation for *downward closure* is similar. Given two posets L, and L-, - and two monotonic functions α: L <sup>→</sup> L- and γ : L- <sup>→</sup> L, the pair (α, γ) is a *Galois connection* [14] between L and L if <sup>∀</sup>x <sup>∈</sup> L, x- <sup>∈</sup> L- : α(x) - x- <sup>⇔</sup> x γ(x-).

We write <sup>R</sup><sup>n</sup> to denote the Euclidean topological space of dimension n > <sup>0</sup> and <sup>R</sup><sup>+</sup> for the set of non-negative reals; for <sup>S</sup> <sup>⊆</sup> <sup>R</sup><sup>n</sup>, cl(S) and relint(S) denote the topological closure and the relative interior of S, respectively. A topologically closed convex polyhedron (for short, closed polyhedron) is defined as the set of solutions of a finite system C of linear non-strict inequality and linear equality constraints; namely, P = con(C) where

$$\operatorname{con}(\mathcal{C}) \stackrel{\text{def}}{=} \left\{ \mathbf{p} \in \mathbb{R}^n \; \middle| \; \forall \beta = (\mathbf{a}^\mathrm{T} \mathbf{z} \bowtie b) \in \mathcal{C}, \mathbb{s} \in \{\geq, =\} \; . \; a^\mathrm{T} \mathbf{p} \bowtie b \; \right\}.$$

A vector *<sup>r</sup>* <sup>∈</sup> <sup>R</sup><sup>n</sup> such that *<sup>r</sup>* <sup>=</sup> **<sup>0</sup>** is a *ray* of a non-empty polyhedron P ⊆ <sup>R</sup><sup>n</sup> if, <sup>∀</sup>*<sup>p</sup>* ∈ P and <sup>∀</sup>ρ <sup>∈</sup> <sup>R</sup>+, it holds *<sup>p</sup>* <sup>+</sup> ρ*<sup>r</sup>* ∈ P. The empty polyhedron has no rays. If both *<sup>r</sup>* and <sup>−</sup>*<sup>r</sup>* are rays of <sup>P</sup>, then *<sup>r</sup>* is a *line* of <sup>P</sup>. The set P ⊆ <sup>R</sup><sup>n</sup> is a closed polyhedron if there exist finite sets L, R, P <sup>⊆</sup> <sup>R</sup><sup>n</sup> such that **<sup>0</sup>** <sup>∈</sup>/ (<sup>L</sup> <sup>∪</sup> <sup>R</sup>) and P = gen L, R, P , where

$$\operatorname{gen} \left( \langle L, R, P \rangle \right) \stackrel{\text{def}}{=} \left\{ L\lambda + R\rho + P\pi \in \mathbb{R}^n \mid \lambda \in \mathbb{R}^\ell, \rho \in \mathbb{R}\_+^r, \pi \in \mathbb{R}\_+^p, \sum\_{i=1}^p \pi\_i = 1 \right\}.$$

When <sup>P</sup> <sup>=</sup> <sup>∅</sup>, we say that <sup>P</sup> is described by the *generator system* <sup>G</sup> <sup>=</sup> L, R, P . In the following, we will abuse notation by adopting the usual set operator and relation symbols to denote the corresponding component-wise extensions on systems. For instance, for <sup>G</sup> <sup>=</sup> L, R, P and <sup>G</sup> <sup>=</sup> L , R , P , we will write G⊆G to mean L <sup>⊆</sup> L , R <sup>⊆</sup> R and P <sup>⊆</sup> P .

The DD method due to Motzkin et al. [28] allows combining the constraints and the generators of a polyhedron <sup>P</sup> into a DD pair (C, <sup>G</sup>): a *conversion* procedure [10–12] is used to obtain each description starting from the other one, also removing the redundant elements. For presentation purposes, we focus on the conversion from constraints to generators; the opposite conversion works in the same way, using duality to switch the roles of constraints and generators. We do not describe lower level details such as the *homogenization* process, mapping the polyhedron into a polyhedral cone, or the *simplification* step, needed for computing DD pairs in minimal form.

The conversion procedure starts from a DD pair (C<sup>0</sup>, <sup>G</sup>0) representing the whole vector space and adds, one at a time, the elements of the input constraint system <sup>C</sup> <sup>=</sup> {β<sup>0</sup>,...,βm}, producing a sequence of DD pairs - (Ck, <sup>G</sup>k) 0≤k≤m+1 representing the polyhedra

$$\mathbb{R}^n = \mathcal{P}\_0 \xrightarrow{\beta\_0} \dots \xrightarrow{\beta\_{k-1}} \mathcal{P}\_k \xrightarrow{\beta\_k} \mathcal{P}\_{k+1} \xrightarrow{\beta\_{k+1}} \dots \xrightarrow{\beta\_m} \mathcal{P}\_{m+1} = \mathcal{P}.$$

At each iteration, when adding the constraint <sup>β</sup><sup>k</sup> to polyhedron <sup>P</sup><sup>k</sup> = gen(Gk), the generator system <sup>G</sup><sup>k</sup> is partitioned into the three components <sup>G</sup><sup>+</sup> <sup>k</sup> , <sup>G</sup><sup>0</sup> <sup>k</sup>, G<sup>−</sup> k , according to the sign of the scalar products of the generators with <sup>β</sup><sup>k</sup> (those in G0 <sup>k</sup> are the *saturators* of <sup>β</sup><sup>k</sup>); the new generator system for polyhedron <sup>P</sup>k+1 is computed as Gk+1 def <sup>=</sup> <sup>G</sup><sup>+</sup> <sup>k</sup> ∪ G<sup>0</sup> <sup>k</sup> ∪ G <sup>k</sup>, where <sup>G</sup> <sup>k</sup> = comb adj<sup>β</sup>*<sup>k</sup>* (G<sup>+</sup> <sup>k</sup> , <sup>G</sup><sup>−</sup> <sup>k</sup> ) and

$$\text{comb\\_adj}\_{\beta\_k}(\mathcal{G}\_k^+, \mathcal{G}\_k^-) \overset{\text{def}}{=} \left\{ \text{comb}\_{\beta\_k}(g^+, g^-) \mid g^+ \in \mathcal{G}\_k^+, g^- \in \mathcal{G}\_k^-, \text{adj}\_{\mathcal{P}\_k}(g^+, g^-) \right\}.$$

Function 'comb<sup>β</sup>*<sup>k</sup>* ' computes a linear combination of its arguments, yielding a generator that saturates the constraint <sup>β</sup><sup>k</sup>; predicate 'adj<sup>P</sup>*<sup>k</sup>* ' is used to select only those pairs of generators that are *adjacent* in Pk.

The set CP<sup>n</sup> of all closed polyhedra on the vector space R<sup>n</sup>, partially ordered by set inclusion, is a lattice CP<sup>n</sup>, <sup>⊆</sup>, <sup>∅</sup>, <sup>R</sup><sup>n</sup>,∩, , where the empty set and <sup>R</sup><sup>n</sup> are the bottom and top elements, the binary meet operator is set intersection and the binary join operator '' is the convex polyhedral hull. A constraint <sup>β</sup> = (*a*<sup>T</sup>*<sup>x</sup>* b) is said to be *valid* for P ∈ CP<sup>n</sup> if all the points in <sup>P</sup> satisfy <sup>β</sup>; for each such <sup>β</sup>, the subset <sup>F</sup> <sup>=</sup> { *<sup>p</sup>* ∈P| *<sup>a</sup>*<sup>T</sup>*<sup>p</sup>* <sup>=</sup> <sup>b</sup> } is a *face* of <sup>P</sup>. We write *cFaces*<sup>P</sup> (possibly omitting the subscript) to denote the finite set of faces of P ∈ CPn. This is a meet sublattice of CP<sup>n</sup> and <sup>P</sup> <sup>=</sup> - relint(F) <sup>F</sup> <sup>∈</sup> *cFaces*<sup>P</sup> .

When C is extended to allow for *strict* inequalities, P = con(C) is an NNC (not necessarily closed) polyhedron. The set P<sup>n</sup> of all NNC polyhedra on R<sup>n</sup> is a lattice <sup>P</sup><sup>n</sup>, <sup>⊆</sup>, <sup>∅</sup>, <sup>R</sup><sup>n</sup>,∩, and CP<sup>n</sup> is a sublattice of <sup>P</sup>n. As shown in [3, Theorem 4.4], a description of an NNC polyhedron P ∈ <sup>P</sup><sup>n</sup> can be obtained by extending the generator system with a finite set C of *closure points*. Namely, for <sup>G</sup> <sup>=</sup> L, R, C, P , we define <sup>P</sup> = gen(G), where

$$\operatorname{gen} \left( \langle L, R, C, P \rangle \right) \stackrel{\text{def}}{=} \left\{ L\lambda + R\rho + C\gamma + P\pi \in \mathbb{R}^n \; \middle| \begin{array}{l} \lambda \in \mathbb{R}^\ell, \rho \in \mathbb{R}\_+^r, \\ \gamma \in \mathbb{R}\_+^c, \pi \in \mathbb{R}\_+^p, \pi \neq \mathbf{0}, \\ \sum\_{i=1}^c \gamma\_i + \sum\_{i=1}^p \pi\_i = 1 \end{array} \right\}.$$

For an NNC polyhedron P ∈ <sup>P</sup>n, the finite set *nncFaces*<sup>P</sup> of its faces is a meet sublattice of <sup>P</sup><sup>n</sup> and <sup>P</sup> <sup>=</sup> - relint(F) <sup>F</sup> <sup>∈</sup> *nncFaces*<sup>P</sup> . Letting Q = cl(P), the closure operator cl: *nncFaces*<sup>P</sup> → *cFaces*<sup>Q</sup> maps each NNC face of P into a face of Q. The image cl(*nncFaces*<sup>P</sup> ) is a join sublattice of *cFaces*<sup>Q</sup> and its nonempty elements form an *upward closed subset*, which can be described by recording the minimal elements only (i.e., the atoms of the *nncFaces*<sup>P</sup> lattice).

#### **3 Direct Representations for NNC Polyhedra**

An NNC polyhedron can be described by using an extended constraint system <sup>C</sup> <sup>=</sup> C<sup>=</sup>, C≥, C<sup>&</sup>gt; and/or an extended generator system <sup>G</sup> <sup>=</sup> L, R, C, P . These representations are said to be *geometric*, meaning that they provide a precise description of the position of their elements. For a closed polyhedron P ∈ CPn, the use of completely geometric representations is an adequate choice. In the case of an NNC polyhedron P ∈ <sup>P</sup><sup>n</sup> such a choice is questionable, since the precise geometric position of some of the elements is not really needed.

*Example 1.* Consider the NNC polyhedron P ∈ <sup>P</sup><sup>2</sup> in the next figure, where the (strict) inequality constraints are denoted by (dashed) lines and the (closure) points are denoted by (unfilled) circles.

<sup>P</sup> is described by <sup>G</sup> <sup>=</sup> L, R, C, P , where L <sup>=</sup> R <sup>=</sup> <sup>∅</sup>, C <sup>=</sup> {c0, c1, c<sup>2</sup>} and P <sup>=</sup> {p0, p<sup>1</sup>}. However, there is no need to know the position of point p<sup>1</sup>, since it can be replaced by any other point on the open segment (c0, c<sup>1</sup>). Similarly, when considering the constraint representation, there is no need to know the exact slope of the strict inequality constraint β.

We now show that P ∈ <sup>P</sup><sup>n</sup> can be more appropriately represented by integrating a geometric description of <sup>Q</sup> = cl(P) <sup>∈</sup> CP<sup>n</sup> (the *skeleton*) with a combinatorial description of *nncFaces*<sup>P</sup> (the *non-skeleton*). We consider here the generator system representation; the extension to constraints will be briefly outlined in a later section.

**Definition 1 (Skeleton of a generator system).** *Let* <sup>G</sup> <sup>=</sup> L, R, C, P *be a generator system in minimal form,* P = gen(G) *and* Q = cl(P)*. The* skeleton *of* <sup>G</sup> *is* SK<sup>Q</sup> = skel(G) def <sup>=</sup> L, R, C <sup>∪</sup> *SP*, <sup>∅</sup>*, where SP* <sup>⊆</sup> P *holds the points that can not be obtained by combining the other generators in* G*.*

Note that the skeleton has no points at all, so that gen(SKQ) = ∅. However, we can define a variant function gen L, R, C, P def = gen L, R, <sup>∅</sup>, C <sup>∪</sup> P , showing that the skeleton of an NNC polyhedron provides a non-redundant representation of its topological closure.

**Proposition 1.** *If* P = gen(G) *and* Q = cl(P)*, then* gen(G) = gen(SKQ) = Q*. Also, there does not exist* G ⊂ SK<sup>Q</sup> *such that* gen(G ) = Q*.*

The elements of *SP* <sup>⊆</sup> P are called *skeleton points*; the non-skeleton points in P \ *SP* are redundant when representing the topological closure; these *nonskeleton points* are the elements in G that need not be represented geometrically.

Consider a point *<sup>p</sup>* ∈ Q = cl(P) (not necessarily in P). There exists a single face <sup>F</sup> <sup>∈</sup> *cFaces*<sup>Q</sup> such that *<sup>p</sup>* <sup>∈</sup> relint(F). By definition of function 'gen', point *<sup>p</sup>* behaves as a *filler* for relint(F) meaning that, when combined with the skeleton, it generates relint(F). Note that *<sup>p</sup>* also behaves as a filler for the relative interiors of all the faces in the set <sup>↑</sup> F. The choice of *<sup>p</sup>* <sup>∈</sup> relint(F) is actually arbitrary: any other point of relint(F) would be equivalent as a filler. A less arbitrary representation for relint(F) is thus provided by its own skeleton SK<sup>F</sup> ⊆ SKQ; we say that SK<sup>F</sup> is the *support* for the points in relint(F) and that any point *p* ∈ relint gen(SK<sup>F</sup> ) = relint(F) is a *materialization* of SK<sup>F</sup> .

In the following we will sometimes omit subscripts when clear from context.

**Definition 2 (Support sets for a skeleton).** *Let* SK *be the skeleton of an NNC polyhedron and let* <sup>Q</sup> <sup>=</sup> gen(SK) <sup>∈</sup> CPn*. The set of all supports for* SK *is defined as* NSSK def <sup>=</sup> {SK<sup>F</sup> ⊆ SK | <sup>F</sup> <sup>∈</sup> *cFaces*<sup>Q</sup> }*.*

We now define functions mapping a subset of the (geometric) points of an NNC polyhedron into the set of supports filled by these points, and vice versa.

**Definition 3 (Filled supports).** *Let* SK *be the skeleton of the polyhedron* P ∈ <sup>P</sup>n*,* <sup>Q</sup> = cl(P) *and* NS *be the corresponding set of supports. The abstraction function* <sup>α</sup>SK : <sup>℘</sup>(Q) <sup>→</sup> <sup>℘</sup><sup>↑</sup>(NS) *is defined, for each* <sup>S</sup> ⊆ Q*, as*

$$\alpha\_{\mathcal{SK}}(S) \stackrel{\text{def}}{=} \bigcup \{ \uparrow \mathcal{SK}\_F \mid \exists \mathbf{p} \in S, F \in cFaces \ . \; \mathbf{p} \in \text{relint}(F) \} .$$

*The concretization function* <sup>γ</sup>SK : <sup>℘</sup><sup>↑</sup>(NS) <sup>→</sup> <sup>℘</sup>(Q)*, for each NS* <sup>∈</sup> <sup>℘</sup><sup>↑</sup>(NS)*, is defined as*

$$\gamma\_{\mathcal{SK}}(NS) \stackrel{\text{def}}{=} \bigcup \{ \text{relint}(\overline{\text{gen}}(ns)) \mid ns \in NS \}.$$

**Proposition 2.** *The pair of functions* (αSK, γSK) *is a Galois connection. If* P = gen L, R, C, P <sup>∈</sup> <sup>P</sup><sup>n</sup> *and* SK *is its skeleton, then* <sup>P</sup> = (γSK ◦ <sup>α</sup>SK)(P)*.*

The non-skeleton component of a geometric generator system can be abstracted by 'αSK' and described as a combination of skeleton generators.

**Definition 4 (Non-skeleton of a generator system).** *Let* P ∈ <sup>P</sup><sup>n</sup> *be defined by generator system* <sup>G</sup> <sup>=</sup> L, R, C, P *and let* SK *be the corresponding skeleton component. The* non-skeleton *component of* G *is defined as NS*<sup>G</sup> def <sup>=</sup> αSK(P)*.*

*Example 2.* Consider the generator system G of polyhedron P from Example 1. Its skeleton is SK = <sup>∅</sup>, <sup>∅</sup>, {c<sup>0</sup>, c<sup>1</sup>, c<sup>2</sup>, p<sup>0</sup>}, <sup>∅</sup> , so that <sup>p</sup><sup>1</sup> is not a skeleton point. By Definition 3, *NS*<sup>G</sup> <sup>=</sup> <sup>α</sup>SK {p<sup>0</sup>, p<sup>1</sup>} <sup>=</sup> ↑{p<sup>0</sup>} ∪ ↑{c<sup>0</sup>, c<sup>1</sup>}<sup>2</sup> The minimal elements in *NS*<sup>G</sup> can be seen to describe the atoms of *nncFaces*<sup>P</sup> , i.e., the 0-dimension face {p<sup>0</sup>} and the 1-dimension open segment (c<sup>0</sup>, c<sup>1</sup>).

The new representation is semantically equivalent to the fully geometric one.

<sup>2</sup> Since there are no rays and no lines, we adopt a simplified notation, identifying each support with the set of its closure points. Also note that relint({p<sup>0</sup>}) = {p<sup>0</sup>}.

**Corollary 1.** *For a polyhedron* <sup>P</sup> = gen(G) <sup>∈</sup> <sup>P</sup>n*, let* SK, *NS be the skeleton and non-skeleton components for* <sup>G</sup>*. Then* <sup>P</sup> <sup>=</sup> γSK(*NS*)*.*

#### **4 The New Conversion Algorithm**

The conversion function in Pseudocode 1 incrementally processes each of the input constraints <sup>β</sup> ∈ C*in* keeping the generator system SK, *NS* up-to-date. The distinction between the skeleton and non-skeleton allows for a corresponding separation in the conversion procedure. Moreover, a few minor adaptations to their representation, discussed below, allow for efficiency improvements.

First, observe that every support *ns* ∈ *NS* always includes all of the lines in the L skeleton component; hence, these lines can be left *implicit* in the representation of the supports in *NS*. Note that, even after removing the lines, each *ns* ∈ *NS* is still a non-empty set, since it includes at least one closure point.

When lines are implicit, those supports *ns* ∈ *NS* that happen to be singletons<sup>3</sup> can be seen to play a special role: they correspond to the combinatorial encoding of the skeleton points in *SP* (see Definition 1). These points are not going to benefit from the combinatorial representation, hence we move them from the non-skeleton to the skeleton component; namely, SK <sup>=</sup> L, R, C <sup>∪</sup> *SP*, <sup>∅</sup> is represented as SK <sup>=</sup> L, R, C, *SP* . The formalization presented in Sect. <sup>3</sup> is still valid, replacing 'γSK' with γ SK(*NS*) def = gen(SK) <sup>∪</sup> γSK(*NS*).

At the implementation level, each support *ns* ∈ *NS* can be encoded by using a *set of indices* on the data structure representing the skeleton component SK. Since *NS* is a finite upward closed set, the representation only needs to record its minimal elements. A support *ns* <sup>∈</sup> *NS* is *redundant in* SK, *NS* if there exists *ns* <sup>∈</sup> *NS* such that *ns* <sup>⊂</sup> *ns* or if *ns* <sup>∩</sup> *SP* <sup>=</sup> <sup>∅</sup>, where SK <sup>=</sup> L, R, C, *SP* . We write *NS* <sup>1</sup> <sup>⊕</sup> *NS* <sup>2</sup> to denote the non-redundant union of *NS* <sup>1</sup>, *NS* <sup>2</sup> <sup>⊆</sup> NSSK.

#### **4.1 Processing the Skeleton**

Line 3 of conversion partitions the skeleton SK into SK<sup>+</sup>, SK<sup>0</sup> and SK<sup>−</sup>, according to the signs of the scalar products with constraint β. Note that the partition information is *logically* computed (no copies are performed) and it is stored in the SK component itself; therefore, any update to SK<sup>+</sup>, SK<sup>0</sup> and SK<sup>−</sup> directly propagates to SK. In line 7 the generators in SK<sup>+</sup> and SK<sup>−</sup> are combined to produce SK, which is merged into SK<sup>0</sup>. These steps are similar to the ones for closed polyhedra, except that we now have to consider more kinds of combinations: the systematic case analysis is presented in Table 1. For instance, when processing a non-strict inequality β<sup>≥</sup>, if we combine a closure point in SK<sup>+</sup> with a ray in SK<sup>−</sup> we obtain a closure point in SK (row 3, column 6). Since it is restricted to work on the skeleton component, this combination phase can safely apply the adjacency tests to quickly get rid of redundant elements.

<sup>3</sup> By 'singleton' here we mean a system *ns* = - <sup>∅</sup>, <sup>∅</sup>, {*p*}, <sup>∅</sup> .



**Table 1.** Case analysis for function 'combβ' when adding an equality (β<sup>=</sup>), a non-strict (β<sup>≥</sup>) or a strict (β<sup>&</sup>gt;) inequality constraint to a pair of generators from SK<sup>+</sup> and SK<sup>−</sup> (R = ray, C = closure point, SP = skeleton point).


#### **4.2 Processing the Non-skeleton**

Line 4 partitions the supports in *NS* by exploiting the partition information for the skeleton SK, so that no additional scalar product is computed. Namely, each support *ns* ∈ *NS* is classified as follows:

$$\begin{aligned} ns \in NS^+ &\iff ns \subseteq (\mathcal{SK}^+ \cup \mathcal{SK}^0) \land ns \cap \mathcal{SK}^+ \neq \emptyset; \\ ns \in NS^0 &\iff ns \subseteq \mathcal{SK}^0; \\ ns \in NS^- &\iff ns \subseteq (\mathcal{SK}^- \cup \mathcal{SK}^0) \land ns \cap \mathcal{SK}^- \neq \emptyset; \\ ns \in NS^\pm &\iff ns \cap \mathcal{SK}^+ \neq \emptyset \land ns \cap \mathcal{SK}^- \neq \emptyset. \end{aligned}$$

This partitioning is consistent with the previous one. For instance, if *ns* <sup>∈</sup> *NS* <sup>+</sup>, then for every possible materialization *p* ∈ relint(gen(*ns*)) the scalar product of *p* and β is strictly positive. The supports in *NS* <sup>±</sup> are those whose materializations can satisfy, saturate and violate the constraint β (i.e., the corresponding face *crosses* the constraint hyperplane).

In lines 8 and 9, we find the calls to the two main functions processing the non-skeleton component. A set *NS* of new supports is built as the union of the contributes provided by functions move-ns and create-ns.

**Moving Supports.** The move-ns function, shown in Pseudocode 2, processes the supports in *NS* <sup>±</sup>: this function "moves" the fillers of the faces that are crossed by the new constraint, making sure they lie on the correct side.

Let *ns* <sup>∈</sup> *NS* <sup>±</sup> and <sup>F</sup> = relint(gen(*ns*)). Note that *ns* <sup>=</sup> SK<sup>F</sup> *before* the addition of the new constraint β; at this point, the elements in SK have been added to SK<sup>0</sup>, but this change still has to be propagated to the non-skeleton component *NS*. Therefore, we compute the *support closure* 'supp.clSK(*ns*)' according to the updated skeleton SK. Intuitively, supp.clSK(*ns*) ⊆ SK is the subset of all the skeleton elements that are included in face F.

At the implementation level, support closures can be efficiently computed by exploiting the same *saturation information* used for the adjacency tests. Namely, for constraints C and generators G, we can define

$$\begin{aligned} \text{sat.inter}\_{\mathcal{C}}(\mathcal{G}) & \stackrel{\text{def}}{=} \{ \beta' \in \mathcal{C} \mid \forall g \in \mathcal{G}: g \text{ saturates } \beta' \}, \\ \text{sat.inter}\_{\mathcal{G}}(\mathcal{C}) & \stackrel{\text{def}}{=} \{ g \in \mathcal{G} \mid \forall \beta' \in \mathcal{C}: g \text{ saturates } \beta' \}. \end{aligned}$$

Then, if <sup>C</sup> and SK <sup>=</sup> L, R, C, *SP* are the constraint system and the skeleton generator system for the polyhedron, for each *ns* ∈ *NS* we can compute [26]:

$$\text{supp.cl}\_{\mathcal{SK}}(ns) \stackrel{\text{def}}{=} \text{sat.inter}\_{\mathcal{SK}}\left(\text{sat.inter}\_{\mathcal{C}}(ns)\right) \mid L.$$

Face F is split by constraint β into F <sup>+</sup>, <sup>F</sup><sup>0</sup> and <sup>F</sup> <sup>−</sup>. When <sup>β</sup> is a strict inequality, only F <sup>+</sup> shall be kept in the polyhedron; when the new constraint is a non-strict inequality, both F <sup>+</sup> and <sup>F</sup><sup>0</sup> shall be kept. A minimal non-skeleton representation for these subsets can be obtained by *projecting* the support:

$$\text{proj}\_{\mathcal{SK}}^{\beta}(ns) \stackrel{\text{def}}{=} \begin{cases} ns \; \backslash \; \mathcal{SK}^{-}, & \text{if } \beta \text{ is a strict inequality;}\\ ns \; \cap \; \mathcal{SK}^{0}, & \text{otherwise.} \end{cases}$$

To summarize, by composing support closure and projection in line 3 of move-ns, each support in *NS* <sup>±</sup> is moved to the correct side of <sup>β</sup>.

*Example 3.* Consider P ∈ <sup>P</sup><sup>2</sup> in the left hand side of the next figure.

The skeleton SK <sup>=</sup> ∅, <sup>∅</sup>, C, <sup>∅</sup> contains the closure points in C <sup>=</sup> {c<sup>0</sup>, c<sup>1</sup>, c<sup>2</sup>, c<sup>3</sup>}; the non-skeleton *NS* <sup>=</sup> {*ns*} contains a single support *ns* <sup>=</sup> {c<sup>0</sup>, c<sup>3</sup>}, which

makes sure that the open segment (c<sup>0</sup>, c<sup>3</sup>) is included in <sup>P</sup>; the figure shows a single materialization for *ns*.

When processing β = (y < 1), we obtain the polyhedron in the right hand side of the figure. In the skeleton phase of the conversion function the adjacent skeleton generators are combined: <sup>c</sup><sup>4</sup> (from <sup>c</sup><sup>0</sup> ∈ SK<sup>+</sup> and <sup>c</sup><sup>3</sup> ∈ SK−) and <sup>c</sup><sup>5</sup> (from <sup>c</sup><sup>1</sup> ∈ SK<sup>+</sup> and <sup>c</sup><sup>2</sup> ∈ SK−) are added to SK<sup>0</sup>. Since the non-skeleton support *ns* belongs to *NS* <sup>±</sup>, it is processed in the move-ns function:

$$ns^\* = \text{proj}\_{\mathcal{SK}}^\beta \{ \text{supp.cl}\_{\mathcal{SK}}(ns) \} = \text{proj}\_{\mathcal{SK}}^\beta \{ \{c\_0, c\_3, c\_4\} \} = \{c\_0, c\_4\}.$$

In contrast, if we were processing the non-strict inequality <sup>β</sup> = (y <sup>≤</sup> 1), we would have obtained *ns* = proj<sup>β</sup>- SK supp.clSK(*ns*) <sup>=</sup> {c<sup>4</sup>}. Since *ns* is a singleton, it is upgraded to become a skeleton point by procedure promote-singletons. Hence, in this case the new skeleton is SK <sup>=</sup> ∅, <sup>∅</sup>, C, *SP* , where C <sup>=</sup> {c0, c1, c<sup>5</sup>} and *SP* <sup>=</sup> {c<sup>4</sup>}, while the non-skeleton component is empty.

**Creating New Supports.** Consider the case of a support *ns* ∈ *NS* <sup>−</sup> violating a non-strict inequality constraint β: this support has to be removed from *NS*. However, the upward closed set *NS* is represented by its minimal elements only so that, by removing *ns*, we are also implicitly removing other supports from the set ↑ *ns*, including some that do not belong to *NS* <sup>−</sup> and hence should be kept. Therefore, we have to explore the set of faces and detect those that are going to lose their filler: their minimal supports will be added to *NS* . Similarly, when processing a non-strict inequality constraint, we need to consider the new faces introduced by the constraint: the corresponding supports can be found by projecting on the constraint hyperplane those faces that are possibly filled by an element in *SP* <sup>+</sup> or *NS* <sup>+</sup>.

This is the task of the create-ns function, shown in Pseudocode 2. It uses enumerate-faces as a helper:<sup>4</sup> the latter provides an enumeration of all the (higher dimensional) faces that contain the initial support *ns*. The new faces are obtained by adding to *ns* a new generator g and then composing the support closure and projection functions, as done in move-ns. For efficiency purposes, a case analysis is performed so as to restrict the search area of the enumeration phase, by considering only the faces crossing the constraint.

*Example 4.* Consider P ∈ <sup>P</sup><sup>2</sup> in the left hand side of the next figure, described by skeleton SK <sup>=</sup> ∅, <sup>∅</sup>, {c<sup>0</sup>, c<sup>1</sup>, c<sup>2</sup>}, {p} and non-skeleton *NS* <sup>=</sup> <sup>∅</sup>.

<sup>4</sup> This enumeration phase is inspired by the algorithm in [26].

**Pseudocode 2.** Helper functions for moving and creating supports.


# **Pseudocode 3.** Processing a line violating constraint β.


The partition for SK induced by the non-strict inequality is as follows:

$$\mathcal{SK}^+ = \langle \emptyset, \emptyset, \emptyset, \{p\} \rangle, \quad \mathcal{SK}^0 = \langle \emptyset, \emptyset, \{c\_0, c\_2\}, \emptyset \rangle, \quad \mathcal{SK}^- = \langle \emptyset, \emptyset, \{c\_1\}, \emptyset \rangle.$$

There are no adjacent generators in SK<sup>+</sup> and SK<sup>−</sup>, so that SK is empty. When processing the non-skeleton component, the skeleton point in SK<sup>+</sup> will be considered in line 15 of function create-ns. The corresponding call to function enumerate-faces computes

$$\operatorname{res}^\star = \operatorname{proj}\_{\mathcal{SK}}^\beta \left( \operatorname{supp} \operatorname{cl}\_{\mathcal{SK}} (\{p\} \cup \{c\_1\}) \right) = \operatorname{proj}\_{\mathcal{SK}}^\beta \left( \{c\_0, c\_1, c\_2, p\} \right) = \{c\_0, c\_2\},$$

thereby producing the filler for the open segment (c0, c<sup>2</sup>). The resulting polyhedron, shown in the right hand side of the figure, is thus described by the skeleton SK <sup>=</sup> ∅, <sup>∅</sup>, {c0, c<sup>2</sup>}, {p} and the non-skeleton *NS* <sup>=</sup> {*ns*}.

It is worth noting that, when handling Example 4 adopting an entirely geometric representation, closure point <sup>c</sup><sup>1</sup> needs to be combined with point <sup>p</sup> even if the two generators are *not* adjacent: this leads to a significant efficiency penalty. Similarly, an implementation based on the --representation will have to combine closure point <sup>c</sup><sup>1</sup> with point <sup>p</sup> (and/or with some other --redundant points), because the addition of the slack variable makes them adjacent. Therefore, an implementation based on the new approach obtains a twofold benefit: first, the distinction between skeleton and non-skeleton allows for restricting the handling of non-adjacent combinations to the non-skeleton phase; second, thanks to the combinatorial representation, the non-skeleton component can be processed by using set index operations only, i.e., computing no linear combination at all.

**Preparing for Next Iteration.** In lines 10 to 15 of conversion the generator system is updated for the next iteration. The new supports in *NS* are merged (using '⊕' to remove redundancies) into the appropriate portions of the nonskeleton component. In particular, when processing a strict inequality, in line 12 the helper function

points become closure points L, R, C, *SP* def <sup>=</sup> L, R, C <sup>∪</sup> *SP*, <sup>∅</sup>

is applied to SK<sup>0</sup>, making sure that all of the skeleton points saturating <sup>β</sup> are transformed into closure points having the same position. The final processing step (line 15) calls helper procedure promote-singletons (see Pseudocode 2), making sure that all singleton supports get promoted to skeleton points.

Note that line 5 of conversion, by calling procedure violating-line (see Pseudocode 3) handles the special case of a line violating β. This is just an optimization: the helper procedure strict-on-eq-points can be seen as a tailored version of create-ns, also including the final updating of SK and *NS*.

#### **4.3 Duality**

The definitions given in Sect. 3 for a geometric generator system have their dual versions working on a geometric *constraint* system. We provide a brief overview of these correspondences, which are summarized in Table 2.


**Table 2.** Correspondences between generator and constraint concepts.

For a non-empty <sup>P</sup> = con(C) <sup>∈</sup> <sup>P</sup>n, the skeleton of <sup>C</sup> <sup>=</sup> C=, C<sup>≥</sup>, C<sup>&</sup>gt; includes the non-redundant constraints defining Q = cl(P). Denoting by *SC* <sup>&</sup>gt; the *skeleton strict inequalities* (i.e., those whose corresponding non-strict inequality is not redundant for Q), we have SK<sup>Q</sup> def <sup>=</sup> C=, C<sup>≥</sup> <sup>∪</sup> *SC* <sup>&</sup>gt;, <sup>∅</sup>, so that <sup>Q</sup> = con(SKQ). The *ghost* faces of P are the faces of the closure Q that do not intersect P: *gFaces*<sup>P</sup> def <sup>=</sup> { <sup>F</sup> <sup>∈</sup> *cFaces*<sup>Q</sup> <sup>|</sup> <sup>F</sup> ∩ P <sup>=</sup> ∅ }; thus, <sup>P</sup> = con(SKQ) \ *gFaces*<sup>P</sup> . The set *gFaces* def = *gFaces* ∪ {Q} is a meet sublattice of *cFaces*; also, *gFaces* is downward closed and can be represented by its *maximal* elements.

The skeleton support SK<sup>F</sup> of a face <sup>F</sup> <sup>∈</sup> *cFaces*<sup>Q</sup> is defined as the set of all the skeleton constraints that are saturated by all the points in F. Each face <sup>F</sup> <sup>∈</sup> *gFaces* saturates a strict inequality <sup>β</sup><sup>&</sup>gt; <sup>∈</sup> <sup>C</sup><sup>&</sup>gt;: we can represent such a face using its skeleton support SK<sup>F</sup> of which <sup>β</sup><sup>&</sup>gt; is a possible materialization. A constraint system non-skeleton component *NS* <sup>⊆</sup> NS is thus a combinatorial representation of the *strict inequalities* of the polyhedron.

Hence, the non-skeleton components for generators and constraints have a complementary role: in the case of generators they are face *fillers*, marking the minimal faces that are *included* in *nncFaces*; in the case of constraints they are face *cutters*, marking the maximal faces that are *excluded* from *nncFaces*. Note that the non-redundant cutters in *gFaces* are those having a *minimal* skeleton support, as is the case for the fillers.

As it happens with lines, all the equalities in <sup>C</sup><sup>=</sup> are included in all the supports *ns* ∈ *NS* so that, for efficiency, they are not represented explicitly. After removing the equalities, a singleton *ns* ∈ *NS* stands for a *skeleton strict inequality* constraint, which is better represented in the skeleton component, thereby obtaining SK <sup>=</sup> C<sup>=</sup>, C≥, *SC* <sup>&</sup>gt; . Hence, a support *ns* <sup>∈</sup> *NS* is redundant if there exists *ns* ∈ *NS* such that *ns* ⊂ *ns* or if *ns* ∩ *SC* <sup>&</sup>gt; = ∅.

When the concepts underlying the skeleton and non-skeleton representation are reinterpreted as discussed above, it is possible to define a conversion procedure mapping a generator representation into a constraint representation which is very similar to the one from constraints to generators.

#### **5 Experimental Evaluation**

The new representation and conversion algorithms for NNC polyhedra have been implemented and tested in the context of the PPL (Parma Polyhedra Library). A full integration in the PPL domain of NNC polyhedra is not possible, since the latter assumes the presence of the slack variable -. The approach, summarized by the diagram in Fig. 1, is to intercept each call to the PPL's conversion (working on --representations in CPn+1) and pair it with a corresponding call to the new algorithm (working on the new representations in Pn).

**Fig. 1.** High level diagram for the experimental evaluation (non-incremental case).

On the left hand side of the diagram we see the application of the standard PPL conversion procedure: the input --representation is processed by 'old conversion' so as to produce the output --representation DD pair. The '--less encoding' phase produces a copy of the input without the slack variable; this is processed by 'new conversion' to produce the output DD pair, based on the new skeleton/nonskeleton representation. After the two conversions are completed, the outputs are checked for both semantic equivalence and non-redundancy. This final checking phase was successful on all the experiments performed, which include all of the tests in the PPL. In order to assess efficiency, additional code was added to measure the time spent inside the old and new conversion procedures, disregarding the input encoding and output checking phases. It is worth stressing that several experimental evaluations, including recent ones [2], confirm that the PPL is a state-of-the-art implementation of the DD method for a wide spectrum of application contexts.

The first experiment<sup>5</sup> on efficiency is meant to evaluate the *overhead* incurred by the new representation and algorithm for NNC polyhedra when processing topologically closed polyhedra, so as to compare it with the corresponding overhead incurred by the --representation. To this end, we considered the ppl lcdd demo application of the PPL, which solves the *vertex/facet enumeration problem*. In Table 3 we report the results obtained on a selection of the test benchmarks<sup>6</sup> when using: the conversion algorithm for closed polyhedra (columns 2–3); the conversion algorithm for the --representation of NNC polyhedra (columns 4–5); and the new conversion algorithm for the new representation of NNC polyhedra (columns 6–7). Columns 'time' report the number of milliseconds spent; columns 'sat' report the number of saturation (i.e., bit vector) operations, in millions.

The results in Table <sup>3</sup> show that the use of the --representation for closed polyhedra incurs a significant overhead. In contrast, the new representation and algorithm go beyond all expectations: in almost all of the tests there is no overhead at all (that is, any overhead incurred is so small to be masked by the improvements obtained in other parts of the algorithm).


**Table 3.** Overhead of conversion for C polyhedra. Units: time (ms), sat (M).

The second experiment is meant to evaluate the efficiency gains obtained in a more appropriate context, i.e., when processing polyhedra that are *not* topologically closed. To this end, we consider the same benchmark discussed in [3, Table 2],<sup>7</sup> which highlights the efficiency improvement resulting from the adoption of an *enhanced* evaluation strategy (where a knowledgeable user of the

<sup>5</sup> All experiments have been performed on a laptop with an Intel Core i7-3632QM CPU, 16 GB of RAM and running GNU/Linux 4.13.0-25.

<sup>6</sup> We only show the tests where PPL time on closed polyhedra is above 20 ms.

<sup>7</sup> The test dualhypercubes.cc is distributed with the source code of the PPL.

library explicitly invokes, when appropriate, the strong minimization procedures for --representations) with respect to the *standard* evaluation strategy (where the user simply performs the required computation, leaving the burden of optimization to the library developers). In Table 4 we report the results obtained for the most expensive test among those described in [3, Table 2], comparing the standard and enhanced evaluation strategies for the --representation (rows 1 and 2) with the new algorithm (row 3). For each algorithm we show in column 2 the total number of iterations of the conversion procedures and, in the next two columns, the median and maximum sizes of the representations computed at each iteration (i.e., the size of the intermediate results); in columns from 5 to 8 we show the numbers of incremental and non-incremental calls to the conversion procedures, together with the corresponding time spent (in milliseconds); in column 9 we show the time spent in strong minimization of --representations; in the final column, we show the overall time ratio, computed with respect to the time spent by the new algorithm.

**Table 4.** Comparing -representation based (standard and enhanced) computations for NNC polyhedra with the new conversion procedures.


Even though adopting the standard computation strategy (requiring no clever guess by the end user), the new algorithm obtains impressive time improvements, outperforming not only the standard, but also the enhanced computation strategy for the --representation. The reason for the latter efficiency improvement is that the enhanced computation strategy, when invoking the strong minimization procedures, interferes with incrementality: the figures in Table 4 confirm that the new algorithm performs three of the seven required conversions in an incremental way, while in the enhanced case they are all non-incremental. Moreover, a comparison of the iteration counts and the sizes of the intermediate results provides further evidence that the new algorithm is able to maintain a non-redundant description even *during* the iterations of a conversion.

#### **6 Conclusion**

We have presented a new approach for the representation of NNC polyhedra in the Double Description framework, avoiding the use of slack variables and distinguishing between the skeleton component, encoded geometrically, and the nonskeleton component, provided with a combinatorial encoding. We have proposed and implemented a variant of the Chernikova conversion procedure achieving significant efficiency improvements with respect to a state-of-the-art implementation of the domain of NNC polyhedra, thereby providing a solution to all the issues affecting the --representation approach. As future work, we plan to develop a full implementation of the domain of NNC polyhedra based on this new representation. To this end, we will have to reconsider each semantic operator already implemented by the existing libraries (which are based on the addition of a slack variable), so as to propose, implement and experimentally evaluate a corresponding correct specification based on the new approach.

### **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

# Synthesis

### **What's Hard About Boolean Functional Synthesis?**

S. Akshay(B) , Supratik Chakraborty, Shubham Goel, Sumith Kulal, and Shetal Shah

Indian Institute of Technology Bombay, Mumbai, India akshayss@cse.iitb.ac.in

**Abstract.** Given a relational specification between Boolean inputs and outputs, the goal of Boolean functional synthesis is to synthesize each output as a function of the inputs such that the specification is met. In this paper, we first show that unless some hard conjectures in complexity theory are falsified, Boolean functional synthesis must generate large Skolem functions in the worst-case. Given this inherent hardness, what does one do to solve the problem? We present a two-phase algorithm, where the first phase is efficient both in terms of time and size of synthesized functions, and solves a large fraction of benchmarks. To explain this surprisingly good performance, we provide a sufficient condition under which the first phase must produce correct answers. When this condition fails, the second phase builds upon the result of the first phase, possibly requiring exponential time and generating exponential-sized functions in the worst-case. Detailed experimental evaluation shows our algorithm to perform better than other techniques for a large number of benchmarks.

**Keywords:** Skolem functions · Synthesis · SAT solvers CEGAR based approach

#### **1 Introduction**

The algorithmic synthesis of Boolean functions satisfying relational specifications has long been of interest to logicians and computer scientists. Informally, given a Boolean relation between input and outupt variables denoting the specification, our goal is to synthesize each output as a function of the inputs such that the relational specification is satisfied. Such functions have also been called *Skolem functions* in the literature [23,29]. Boole [8] and Lowenheim [27] studied variants of this problem in the context of finding most general unifiers. While these studies are theoretically elegant, implementations of the underlying techniques have been found to scale poorly beyond small problem instances [28]. More recently, synthesis of Boolean functions has found important applications in a wide range of contexts including reactive strategy synthesis [4,19,40], certified QBF-SAT solving [7,21,31,34], automated program synthesis [35,37], circuit

c The Author(s) 2018

repair and debugging [22], disjunctive decomposition of symbolic transition relations [39] and the like. This has spurred recent interest in developing practically efficient Boolean function synthesis algorithms. The resulting new generation of tools [3,17,23,29,33,34,38] have enabled synthesis of Boolean functions from much larger and more complex relational specifications than those that could be handled by earlier techniques, viz. [20,21,28].

In this paper, we re-examine the Boolean functional synthesis problem from both theoretical and practical perspectives. Our investigation shows that unless some hard conjectures in complexity theory are falsified, Boolean functional synthesis must necessarily generate super-polynomial sized Skolem functions, thereby requiring super-polynomial time, in the worst-case. Therefore, it is unlikely that an efficient algorithm exists for solving all instances of Boolean functional synthesis. There are two ways to address this hardness in practice: (i) design algorithms that are provably efficient but may give "approximate" Skolem functions that are correct on only a fraction of all possible input assignments, or (ii) design a phased algorithm, wherein the initial phase(s) is/are provably efficient and solve a subset of problem instances, and subsequent phase(s) have worst-case exponential behaviour and solve all remaining problem instances. In this paper, we combine the two approaches while giving heavy emphasis on efficient instances. We also provide a sufficient condition for our algorithm to be efficient, which indeed is borne out by our experiments.

The primary contributions of this paper can be summarized as follows.

	- (a) Phase 1 of our algorithm generates candidate Skolem functions of size polynomial in the input specification. This phase makes polynomially many calls to an NP oracle (SAT solver in practice). Hence it directly benefits from the progess made by the SAT solving community, and is efficient in practice. Our experiments indicate that Phase 1 suffices to solve a large majority of publicly available benchmarks.
	- (b) However, there are indeed cases where the first phase is not enough (our theoretical results imply that such cases likely exist). In such cases, the first phase provides good candidate Skolem functions as starting points for the second phase. Phase 2 of our algorithm starts from these candidate Skolem functions, and uses a CEGAR-based approach to produce correct Skolem functions whose size may indeed be exponential in the input specification.

rise to input structures that satisfy this condition. The goodness of Skolem functions generated in this phase of the algorithm can also be quantified with high confidence by invoking an approximate model counter [13], whose complexity lies in BPPNP.

4. We conduct an extensive set of experiments over a variety of benchmarks, and show that our algorithm performs favourably vis-a-vis state-of-the-art algorithms for Boolean functional synthesis.

*Related Work.* The literature contains several early theoretical studies on variants of Boolean functional synthesis [6,8,9,16,27,30]. More recently, researchers have tried to build practically efficient synthesis tools that scale to medium or large problem instances. In [29], Skolem functions for **X** are extracted from a proof of validity of <sup>∀</sup>**Y**∃**<sup>X</sup>** <sup>F</sup>(**X**, **<sup>Y</sup>**). Unfortunately, this doesn't work when <sup>∀</sup>**Y**∃**<sup>X</sup>** <sup>F</sup>(**X**, **<sup>Y</sup>**) is not valid, despite this class of problems being important, as discussed in [3,17]. Inspired by the spectacular effectiveness of CDCL-based SAT solvers, an incremental determinization technique for Skolem function synthesis was proposed in [33]. In [20,39], a synthesis approach based on iterated compositions was proposed. Unfortunately, as has been noted in [17,23], this does not scale to large benchmarks. A recent work [17] adapts the composition-based approach to work with ROBDDs. For factored specifications, ideas from symbolic model checking using implicitly conjoined ROBDDs have been used to enhance the scalability of the technique further in [38]. In the genre of CEGAR-based techniques, [23] showed how CEGAR can be used to synthesize Skolem functions from factored specifications. Subsequently, a compositional and parallel technique for Skolem function synthesis from arbitrary specifications represented using AIGs was presented in [3]. The second phase of our algorithm builds on some of this work. In addition to the above techniques, template-based [37] or sketch-based [36] approaches have been found to be effective for synthesis when we have information about the set of candidate solutions. A framework for functional synthesis that reasons about some unbounded domains such as integer arithmetic, was proposed in [25].

#### **2 Notations and Problem Statement**

A Boolean formula <sup>F</sup>(z1,...z*p*) on <sup>p</sup> variables is a mapping <sup>F</sup> : {0, <sup>1</sup>}*<sup>p</sup>* → {0, <sup>1</sup>}. The set of variables {z1,...z*p*} is called the *support* of the formula, and denoted sup(F). A *literal* is either a variable or its complement. We use <sup>F</sup>|*<sup>z</sup>i*=0 (resp. <sup>F</sup>|*<sup>z</sup>i*=1) to denote the positive (resp. negative) cofactor of <sup>F</sup> with respect to <sup>z</sup>*i*. A *satisfying assignment* or *model* of F is a mapping of variables in sup(F) to {0, <sup>1</sup>} such that <sup>F</sup> evaluates to 1 under this assignment. If <sup>π</sup> is a model of <sup>F</sup>, we write <sup>π</sup> <sup>|</sup><sup>=</sup> <sup>F</sup> and use <sup>π</sup>(z*i*) to denote the value assigned to <sup>z</sup>*<sup>i</sup>* <sup>∈</sup> sup(F) by <sup>π</sup>. Let **<sup>Z</sup>** = (z*<sup>i</sup>*<sup>1</sup> , z*<sup>i</sup>*<sup>2</sup> ,...z*<sup>i</sup><sup>j</sup>* ) be a sequence of variables in sup(F). We use <sup>π</sup>↓**<sup>Z</sup>** to denote the projection of π on **Z**, i.e. the sequence (π(z*<sup>i</sup>*<sup>1</sup> ), π(z*<sup>i</sup>*<sup>2</sup> ),...π(z*<sup>i</sup><sup>j</sup>* )).

A Boolean formula is in *negation normal form (NNF)* if (i) the only operators used in the formula are conjunction (∧), disjunction (∨) and negation (¬), and (ii) negation is applied only to variables. Every Boolean formula can be converted to a semantically equivalent formula in NNF. We assume an NNF formula is represented by a rooted directed acyclic graph (DAG), where internal nodes are labeled by ∧ and ∨, and leaves are labeled by literals. In this paper, we use AIGs [24] as the initial representation of specifications. Given an AIG with t nodes, an equivalent NNF formula of size <sup>O</sup>(t) can be constructed in <sup>O</sup>(t) time. We use <sup>|</sup>F<sup>|</sup> to denote the number of nodes in a DAG representation of <sup>F</sup>.

Let <sup>α</sup> be the subformula represented by an internal node <sup>N</sup> (labeled by <sup>∧</sup> or <sup>∨</sup>) in a DAG representation of an NNF formula. We use lits(α) to denote the set of literals labeling leaves that have a path to the node N representing α in the DAG. A formula is said to be in *weak decomposable NNF*, or wDNNF, if it is in NNF and if for every ∧-labeled node in the DAG, the following holds: let <sup>α</sup> <sup>=</sup> <sup>α</sup><sup>1</sup> <sup>∧</sup> ... <sup>∧</sup> <sup>α</sup>*<sup>k</sup>* be the subformula represented by the internal node. Then, there is no literal <sup>l</sup> and distinct indices i, j ∈ {1,...k} such that <sup>l</sup> <sup>∈</sup> lits(α*i*) and <sup>¬</sup><sup>l</sup> <sup>∈</sup> lits(α*<sup>j</sup>* ). Note that wDNNF is a weaker structural requirement on the NNF representation vis-a-vis the well-studied DNNF representation, which has elegant properties [15]. Specifically, every DNNF formula is also a wDNNF formula.

We say a *literal* l is *pure* in F iff the NNF representation of F has a leaf labeled <sup>l</sup>, but no leaf labeled <sup>¬</sup>l. <sup>F</sup> is said to be *positive unate* in <sup>z</sup>*<sup>i</sup>* <sup>∈</sup> sup(F) iff <sup>F</sup>|*<sup>z</sup>i*=0 <sup>⇒</sup> <sup>F</sup>|*<sup>z</sup>i*=1. Similarly, <sup>F</sup> is said to be *negative unate* in <sup>z</sup>*<sup>i</sup>* iff <sup>F</sup>|*<sup>z</sup>i*=1 <sup>⇒</sup> <sup>F</sup>|*<sup>z</sup>i*=0. Finally, <sup>F</sup> is *unate* in <sup>z</sup>*<sup>i</sup>* if <sup>F</sup> is either positive unate or negative unate in <sup>z</sup>*i*. A function that is not unate in <sup>z</sup>*<sup>i</sup>* <sup>∈</sup> sup(F) is said to be *binate* in <sup>z</sup>*i*.

We also use **X** = (x1,...x*n*) to denote a sequence of Boolean outputs, and **Y** = (y1,...y*m*) to denote a sequence of Boolean inputs. The *Boolean functional synthesis* problem, henceforth denoted BFnS, asks: given a Boolean formula F(**X**, **Y**) specifying a relation between inputs **Y** = (y1,...y*m*) and outputs **X** = (x1,...x*n*), determine functions **Ψ** = (ψ1(**Y**),...ψ*n*(**Y**)) such that <sup>F</sup>(**Ψ**, **<sup>Y</sup>**) holds whenever <sup>∃</sup>**X**F(**X**, **<sup>Y</sup>**) holds. Thus, <sup>∀</sup>**Y**(∃**<sup>X</sup>** <sup>F</sup>(**X**, **<sup>Y</sup>**)) <sup>⇔</sup> <sup>F</sup>(ψ, **<sup>Y</sup>**) must be rendered valid. The function ψ*<sup>i</sup>* is called a *Skolem function* for x*<sup>i</sup>* in F, and **Ψ** = (ψ1,...ψ*n*) is called a *Skolem function vector* for **X** in F.

For 1 <sup>≤</sup> <sup>i</sup> <sup>≤</sup> <sup>j</sup> <sup>≤</sup> <sup>n</sup>, let **<sup>X</sup>***<sup>j</sup> <sup>i</sup>* denote the subsequence (x*i*, x*i*+1,...x*<sup>j</sup>* ) and let F(*i*−1)(**X***<sup>n</sup> <sup>i</sup>* , **<sup>Y</sup>**) denote <sup>∃</sup>**X***<sup>i</sup>*−<sup>1</sup> <sup>1</sup> F(**X***<sup>i</sup>*−<sup>1</sup> <sup>1</sup> , **X***<sup>n</sup> <sup>i</sup>* , **<sup>Y</sup>**). It has been argued in [3,17,20,23] that given a relational specification F(**X**, **Y**), the BFnS problem can be solved by first ordering the outputs, say as <sup>x</sup><sup>1</sup> <sup>≺</sup> <sup>x</sup><sup>2</sup> ··· ≺ <sup>x</sup>*n*, and then synthesizing a function ψ*i*(**X***<sup>n</sup> <sup>i</sup>*+1, **<sup>Y</sup>**) for each <sup>x</sup>*<sup>i</sup>* such that <sup>F</sup>(*i*−1)(ψ*i*, **<sup>X</sup>***<sup>n</sup> <sup>i</sup>*+1, **<sup>Y</sup>**) <sup>⇔</sup> <sup>∃</sup>x*i*F(*i*−1)(x*i*, **<sup>X</sup>***<sup>n</sup> <sup>i</sup>*+1, **<sup>Y</sup>**). Once all such <sup>ψ</sup>*<sup>i</sup>* are obtained, one can substitute <sup>ψ</sup>*i*+1 through ψ*<sup>n</sup>* for x*i*+1 through x*<sup>n</sup>* respectively, in ψ*<sup>i</sup>* to obtain a Skolem function for x*<sup>i</sup>* as a function of only **Y**. We adopt this approach, and therefore focus on obtaining ψ*<sup>i</sup>* in terms of **X***<sup>n</sup> <sup>i</sup>*+1 and **Y**. Furthermore, we know from [20,23] that a function ψ*<sup>i</sup>* is a Skolem function for x*<sup>i</sup>* iff it satisfies Δ*<sup>i</sup> <sup>F</sup>* <sup>⇒</sup> <sup>ψ</sup>*<sup>i</sup>* ⇒ ¬Γ*<sup>i</sup> <sup>F</sup>* , where Δ*i <sup>F</sup>* ≡ ¬∃**X***<sup>i</sup>*−<sup>1</sup> <sup>1</sup> F(**X***<sup>i</sup>*−<sup>1</sup> <sup>1</sup> , 0, **X***<sup>n</sup> <sup>i</sup>*+1, **<sup>Y</sup>**), and <sup>Γ</sup>*<sup>i</sup> <sup>F</sup>* ≡ ¬∃**X***<sup>i</sup>*−<sup>1</sup> <sup>1</sup> F(**X***<sup>i</sup>*−<sup>1</sup> <sup>1</sup> , 1, **X***<sup>n</sup> <sup>i</sup>*+1, **<sup>Y</sup>**). When F is clear from the context, we often omit it and write Δ*<sup>i</sup>* and Γ*i*. It is easy to see that both <sup>Δ</sup>*<sup>i</sup>* and <sup>¬</sup>Γ*<sup>i</sup>* serve as Skolem functions for <sup>x</sup>*<sup>i</sup>* in <sup>F</sup>.

#### **3 Complexity-Theoretical Limits**

In this section, we investigate the computational complexity of BFnS. It is easy to see that BFnS can be solved in EXPTIME. Indeed a naive solution would be to enumerate all possible values of inputs **Y** and invoke a SAT solver to find values of **X** corresponding to each valuation of **Y** that makes F(**X**, **Y**) true. This requires worst-case time exponential in the number of inputs and outputs, and may produce an exponential-sized circuit. Given this, one can ask if we can develop a better algorithm that works faster and synthesizes "small" Skolem functions in all cases? Our first result shows that existence of such small Skolem functions would violate hard complexity-theoretic conjectures.


A consequence of the second statement is that, under the same hypothesis, there must exist an instance of BFnS for which any algorithm must take EXPTIME time. The exponential-time hypothesis ETH and its strengthened version, the non-uniform exponential-time hypothesis ETHnu, are unproven computational hardness assumptions (see [14,18]), which have been used to show that several classical decision, functional and parametrized NP-complete problems (such as p-Clique) are unlikely to have sub-exponential algorithms. ETHnu states that there is no family of algorithms (one for each family of inputs of size n) that can solve 3-SAT in subexponential time. In [14] it is shown that if ETHnu holds, then p*-Clique, the parametrized clique problem*, cannot be solved in sub-exponential time, i.e., for all <sup>d</sup> <sup>∈</sup> <sup>N</sup>, and sufficiently large fixed <sup>k</sup>, determining whether a graph G has a clique of size k is not in DTIME(n*<sup>d</sup>*).

*Proof.* We describe a reduction from p-Clique to BFnS. Given an undirected graph G = (V,E) on n-vertices and a number k (encoded in binary), we want to check if G has a clique of size k. We encode the graph as follows: each vertex <sup>v</sup> <sup>∈</sup> <sup>V</sup> is identified by a unique number in {1,...n}, and for every (i, j) <sup>∈</sup> <sup>V</sup> <sup>×</sup><sup>V</sup> , we introduce an input variable <sup>y</sup>*i,j* that is set to 1 iff (i, j) <sup>∈</sup> <sup>E</sup>. We call the resulting vector of input variables *y*. We also have additional input variables *<sup>z</sup>* <sup>=</sup> <sup>z</sup>1,...z*m*, which represent the binary encoding of <sup>k</sup> (<sup>m</sup> <sup>=</sup> log<sup>2</sup> <sup>k</sup>). Finally, we introduce output variables <sup>x</sup>*<sup>v</sup>* for each <sup>v</sup> <sup>∈</sup> <sup>V</sup> , whose values determine which vertices are present in the clique. Let *x* denote the vector of x*<sup>v</sup>* variables.

Given inputs **<sup>Y</sup>** <sup>=</sup> {*y*, *<sup>z</sup>*}, and outputs **<sup>X</sup>** <sup>=</sup> {*x*}, our specification is represented by a circuit F over **X**, **Y** that verifies whether the vertices encoded by **X** indeed form a k-clique of the graph G. The circuit F is constructed as follows:

<sup>1</sup> Since the submission of this paper, we have obtained a sharper complexity result. Details of this can be found in [2].


Given an instance **<sup>Y</sup>** <sup>=</sup> {*y*, *<sup>z</sup>*} of <sup>p</sup>-Clique, we now consider the specification F(**X**, **Y**) as constructed above and feed it as input to any algorithm A for solving BFnS. Let **<sup>Ψ</sup>** be the Skolem function vector output by <sup>A</sup>. For each <sup>i</sup> ∈ {1,...n}, we now feed ψ*<sup>i</sup>* to the input y*<sup>i</sup>* of the circuit F. This effectively constructs a circuit for F(**Ψ**, **Y**). It is easy to see from the definition of Skolem functions that for every valuation of **Y**, the function F(**Ψ**, **Y**) evaluates to 1 iff the graph encoded by **Y** contains a clique of size k.

Using this reduction, we can complete the proofs of both our statements:


Theorem 1 implies that efficient algorithms for BFnS are unlikely. We therefore propose a two-phase algorithm to solve BFnS in practice. The first phase runs in polynomial time relative to an NP-oracle and generates polynomialsized "approximate" Skolem functions. We show that under certain structural restrictions on the NNF representation of F, the first phase always returns exact Skolem functions. However, these structural restrictions may not always be met. An NP-oracle can be used to check if the functions computed by the first phase are indeed exact Skolem functions. In case they aren't, we proceed to the second phase of our algorithm that runs in worst-case exponential time. Below, we discuss the first phase in detail. The second phase is an adaptation of an existing CEGAR-based technique and is described briefly later.

#### **4 Phase 1: Efficient Polynomial-Sized Synthesis**

An easy consequence of the definition of unateness is the following.

**Proposition 1.** *If* F(**X**, **Y**) *is positive (resp. negative) unate in* x*i, then* ψ*<sup>i</sup>* = 1 *(resp.* ψ*<sup>i</sup>* = 0*) is a correct Skolem function for* x*i.*

All omitted proofs, including that of the above, may be found in [2]. The above result gives us a way to identify outputs x*<sup>i</sup>* for which a Skolem function can be easily computed. Note that if <sup>x</sup>*<sup>i</sup>* (resp. <sup>¬</sup>x*i*) is a pure literal in <sup>F</sup>, then <sup>F</sup> is positive (resp. negative) unate in x*i*. However, the converse is not necessarily true. In general, a semantic check is necessary for unateness. In fact, it follows from the definition of unateness that F is positive (resp. negative) unate in x*i*, iff the formula η<sup>+</sup> *<sup>i</sup>* (resp. <sup>η</sup><sup>−</sup> *<sup>i</sup>* ) defined below is unsatisfiable.

$$\eta\_i^+ = F(\mathbf{X}\_1^{i-1}, 0, \mathbf{X}\_{i+1}^n, \mathbf{Y}) \land \neg F(\mathbf{X}\_1^{i-1}, 1, \mathbf{X}\_{i+1}^n, \mathbf{Y}).\tag{1}$$

$$\eta\_i^- = F(\mathbf{X}\_1^{i-1}, 1, \mathbf{X}\_{i+1}^n, \mathbf{Y}) \land \neg F(\mathbf{X}\_1^{i-1}, 0, \mathbf{X}\_{i+1}^n, \mathbf{Y}).\tag{2}$$

Note that each such check involves a single invocation of an NP-oracle, and a variant of this method is described in [5].

If F is binate in an output x*i*, Proposition 1 doesn't help in synthesizing ψ*i*. Towards synthesizing Skolem functions for such outputs, recall the definitions of Δ*<sup>i</sup>* and Γ*<sup>i</sup>* from Sect. 2. Clearly, if we can compute these functions, we can solve BFnS. While computing Δ*<sup>i</sup>* and Γ*<sup>i</sup> exactly* for all x*<sup>i</sup>* is unlikely to be efficient in general (in light of Theorem 1), we show that polynomial-sized "good" approximations of Δ*<sup>i</sup>* and Γ*<sup>i</sup>* can be computed efficiently. As our experiments show, these approximations are good enough to solve BFnS for several benchmarks. Furthermore, with access to an NP-oracle, we can also check when these approximations are indeed good enough. Given a relational specification <sup>F</sup>(**X**, **<sup>Y</sup>**), we use <sup>F</sup>-

(**X**, **X**, **Y**) to denote the formula obtained by first converting F to NNF, and then replacing every occurrence of <sup>¬</sup>x*<sup>i</sup>* (x*<sup>i</sup>* <sup>∈</sup> **<sup>X</sup>**) in the NNF formula with a fresh variable <sup>x</sup>*i*. As an example, suppose <sup>F</sup>(**X**, **<sup>Y</sup>**)=(x<sup>1</sup> ∨ ¬(x<sup>2</sup> <sup>∨</sup> <sup>y</sup>1)) ∨ ¬(x<sup>2</sup> ∨ ¬(y<sup>2</sup> ∧ ¬y1)). Then F-(**X**, **<sup>X</sup>**, **<sup>Y</sup>**)=(x<sup>1</sup> <sup>∨</sup> (x<sup>2</sup> ∧ ¬y1)) <sup>∨</sup> (x<sup>2</sup> <sup>∧</sup> <sup>y</sup><sup>2</sup> ∧ ¬y1). The following are easy to see. **Proposition 2.** *(a)* <sup>F</sup>-

(**X**, **X**, **Y**) *is positive unate in both* **X** *and* **X***. (b) Let* <sup>¬</sup>**<sup>X</sup>** *denote* (¬x1,... <sup>¬</sup>x*n*)*. Then* <sup>F</sup>(**X**, **<sup>Y</sup>**) <sup>⇔</sup> <sup>F</sup>-(**X**,¬**X**, **<sup>Y</sup>**)*.*

For every <sup>i</sup> ∈ {1,...n}, we can split **<sup>X</sup>** = (x1,...x*n*) into two parts, **<sup>X</sup>***<sup>i</sup>* <sup>1</sup> and **X***<sup>n</sup> <sup>i</sup>*+1, and represent <sup>F</sup>-(**X**, **<sup>X</sup>**, **<sup>Y</sup>**) as <sup>F</sup>-(**X***<sup>i</sup>* <sup>1</sup>, **X***<sup>n</sup> <sup>i</sup>*+1, **<sup>X</sup>***<sup>i</sup>* <sup>1</sup>, **<sup>X</sup>***<sup>n</sup> <sup>i</sup>*+1, **Y**). We use these representations of <sup>F</sup> interchangeably, depending on the context. For b, c ∈ {0, <sup>1</sup>}, let **b***<sup>i</sup>* (resp. **c***<sup>i</sup>* ) denote a vector of i b's (resp. c's). For notational convenience, we use F-(**b***<sup>i</sup>* , **X***<sup>n</sup> <sup>i</sup>*+1, **<sup>c</sup>***<sup>i</sup>* , **X***<sup>n</sup> <sup>i</sup>*+1, **<sup>Y</sup>**) to denote <sup>F</sup>-(**X***<sup>i</sup>* <sup>1</sup>, **X***<sup>n</sup> <sup>i</sup>*+1, **<sup>X</sup>***<sup>i</sup>* <sup>1</sup>, **<sup>X</sup>***<sup>n</sup> <sup>i</sup>*+1, **<sup>Y</sup>**)<sup>|</sup> **X***<sup>i</sup>* 1=**b***i,***X***<sup>i</sup>* 1=**c***<sup>i</sup>* in the subsequent discussion. The following is an easy consequence of Proposition 2.

**Proposition 3.** *For every* <sup>i</sup> ∈ {1,...n}*, the following holds:* F-(**0***<sup>i</sup>* , **X***<sup>n</sup> <sup>i</sup>*+1, **<sup>0</sup>***<sup>i</sup>* ,¬**X***<sup>n</sup> <sup>i</sup>*+1, **<sup>Y</sup>**) ⇒ ∃**X***<sup>i</sup>* <sup>1</sup>F(**X**, **<sup>Y</sup>**) <sup>⇒</sup> <sup>F</sup>-(**1***<sup>i</sup>* , **X***<sup>n</sup> <sup>i</sup>*+1, **<sup>1</sup>***<sup>i</sup>* ,¬**X***<sup>n</sup> <sup>i</sup>*+1, **<sup>Y</sup>**) Proposition 3 allows us to bound Δ*<sup>i</sup>* and Γ*<sup>i</sup>* as follows.

**Lemma 1.** *For every* <sup>x</sup>*<sup>i</sup>* <sup>∈</sup> **<sup>X</sup>***, we have:*

\*\*Proposition 3 allows us to bound \$\Delta\_i\$ and \$I\_i\$ as follows.

\*\*Lemma 1.\*\* \$For every \$x\_i \in \mathbf{X}\$, we have:

(a) \$\neg\hat{F}(\mathbf{1}^{i-1}0, \mathbf{X}\_{i+1}^n, \mathbf{1}^i, \neg\mathbf{X}\_{i+1}^n, \mathbf{Y}) \Rightarrow \Delta\_i \Rightarrow -\hat{F}(\mathbf{0}^i, \mathbf{X}\_{i+1}^n, \mathbf{0}^{i-1}, \neg\mathbf{X}\_{i+1}^n, \mathbf{Y})\$ (b) \neg\hat{F}(\mathbf{1}^i, \mathbf{X}\_{i+1}^n, \mathbf{1}^{i-1}, \mathbf{Y}) \Rightarrow \varGamma\_i \Rightarrow -\hat{F}(\mathbf{0}^{i-1}1, \mathbf{X}\_{i+1}^n, \mathbf{0}^i, \neg\mathbf{X}\_{i+1}^n, \mathbf{Y})

In the remainder of the paper, we only use under-approximations of Δ*<sup>i</sup>* and Γ*i*, and use δ*<sup>i</sup>* and γ*<sup>i</sup>* respectively, to denote them. Recall from Sect. 2 that both Δ*<sup>i</sup>* and <sup>¬</sup>Γ*<sup>i</sup>* suffice as Skolem functions for <sup>x</sup>*i*. Therefore, we propose to use either <sup>δ</sup>*<sup>i</sup>* or <sup>¬</sup>γ*<sup>i</sup>* (depending on which has a smaller AIG) obtained from Lemma <sup>1</sup> as our approximation of ψ*i*. Specifically, δ*i* = ¬F*<sup>i</sup>*+1, **<sup>Y</sup>**), γ*<sup>i</sup>* <sup>=</sup> <sup>¬</sup>F-

$$\begin{aligned} \delta\_i &= \neg \hat{F}(\mathbf{1}^{i-1}0, \mathbf{X}\_{i+1}^n, \mathbf{1}^i, \neg \mathbf{X}\_{i+1}^n, \mathbf{Y}), \ \gamma\_i = \neg \hat{F}(\mathbf{1}^i, \mathbf{X}\_{i+1}^n, \mathbf{1}^{i-1}0, \neg \mathbf{X}\_{i+1}^n, \mathbf{Y})\\ \psi\_i &= \delta\_i \text{ or } \neg \gamma\_i, \text{ depending on which has a smaller AIG} \end{aligned} \tag{3}$$

*Example 1.* Consider the specification **<sup>X</sup>** <sup>=</sup> **<sup>Y</sup>**, expressed in NNF as <sup>F</sup>(**X**, **<sup>Y</sup>**) <sup>≡</sup> *<sup>n</sup> <sup>i</sup>*=1 ((x*<sup>i</sup>* <sup>∧</sup> <sup>y</sup>*i*) <sup>∨</sup> (¬x*<sup>i</sup>* ∧ ¬y*i*)). As noted in [33], this is a difficult example for CEGAR-based QBF solvers, when n is large. From Eq. 3, <sup>δ</sup>*<sup>i</sup>* <sup>=</sup> <sup>¬</sup>(¬y*<sup>i</sup>* <sup>∧</sup> *<sup>n</sup> <sup>j</sup>*=*i*+1(x*<sup>j</sup>* <sup>⇔</sup> <sup>y</sup>*<sup>j</sup>* )) = <sup>y</sup>*<sup>i</sup>* <sup>∨</sup> *<sup>n</sup>*

*<sup>j</sup>*=*i*+1(x*<sup>j</sup>* ⇔ ¬y*<sup>j</sup>* ), and <sup>γ</sup>*<sup>i</sup>* <sup>=</sup> <sup>¬</sup>(y*<sup>i</sup>* <sup>∧</sup> *<sup>n</sup> <sup>j</sup>*=*i*+1(x*<sup>j</sup>* <sup>⇔</sup> <sup>y</sup>*<sup>j</sup>* )) = <sup>¬</sup>y*<sup>i</sup>* <sup>∨</sup> *<sup>n</sup> <sup>j</sup>*=*i*+1(x*<sup>j</sup>* ⇔ ¬y*<sup>j</sup>* ). With <sup>δ</sup>*<sup>i</sup>* as the choice of <sup>ψ</sup>*i*, we obtain <sup>ψ</sup>*<sup>i</sup>* <sup>=</sup> <sup>y</sup>*<sup>i</sup>* <sup>∨</sup> *<sup>n</sup> <sup>j</sup>*=*i*+1(x*<sup>j</sup>* ⇔ ¬y*<sup>j</sup>* ). Clearly, <sup>ψ</sup>*<sup>n</sup>* <sup>=</sup> <sup>y</sup>*n*. On reverse-substituting, we get <sup>ψ</sup>*<sup>n</sup>*−<sup>1</sup> <sup>=</sup> <sup>y</sup>*<sup>n</sup>*−<sup>1</sup> <sup>∨</sup> (ψ*<sup>n</sup>* ⇔ ¬y*n*) = <sup>y</sup>*<sup>n</sup>*−<sup>1</sup> <sup>∨</sup> 0 = <sup>y</sup>*<sup>n</sup>*−<sup>1</sup>. Continuing in this way, we get <sup>ψ</sup>*<sup>i</sup>* <sup>=</sup> <sup>y</sup>*<sup>i</sup>* for all <sup>i</sup> ∈ {1,...n}. The same result is obtained regardless of whether we choose <sup>δ</sup>*<sup>i</sup>* or <sup>¬</sup>γ*<sup>i</sup>* for each <sup>ψ</sup>*i*. Thus, our approximation is good enough to solve this problem. In fact, it can be shown that <sup>δ</sup>*<sup>i</sup>* <sup>=</sup> <sup>Δ</sup>*<sup>i</sup>* and <sup>γ</sup>*<sup>i</sup>* <sup>=</sup> <sup>Γ</sup>*<sup>i</sup>* for all <sup>i</sup> ∈ {1,...n} in this example.

Note that the approximations of Skolem functions, as given in Eq. (3), are efficiently computable for all <sup>i</sup> ∈ {1,...n}, as they involve evaluating <sup>F</sup> with a subset of inputs set to constants. This takes no more than <sup>O</sup>(|F|) time and space. As illustrated by Example 1, these approximations also often suffice to solve BFnS. The following lemma partially explains this.

**Theorem 2.** *(a) For* <sup>i</sup> ∈ {1,...n}*, suppose the following holds:*

$$\begin{aligned} \text{Lemma 2. } (a) \text{ For } i \in \{1, \ldots, n\}, \text{ suppose the following holds:}\\ \forall j \in \{1, \ldots, i\} \quad \widehat{F}(\mathbf{1}^j, \mathbf{X}\_{j+1}^n, \mathbf{1}^j, \overline{\mathbf{X}}\_{j+1}^n, \mathbf{Y}) &\Rightarrow \widehat{F}(\mathbf{1}^{j-1}0, \mathbf{X}\_{j+1}^n, \mathbf{1}^{j-1}1, \overline{\mathbf{X}}\_{j+1}^n, \mathbf{Y}) \\ &\qquad \lor \quad \widehat{F}(\mathbf{1}^{j-1}1, \mathbf{X}\_{j+1}^n, \mathbf{1}^{j-1}0, \overline{\mathbf{X}}\_{j+1}^n, \mathbf{Y}) \\\\ \text{Then } \exists \mathbf{X}\_1^i F(\mathbf{X}, \mathbf{Y}) \Leftrightarrow \widehat{F}(\mathbf{1}^i, \mathbf{X}\_{i+1}^n, \mathbf{1}^i, \neg \mathbf{X}\_{i+1}^n, \mathbf{Y}). \end{aligned}$$

*Then* <sup>∃</sup>**X***<sup>i</sup>* (**1***<sup>i</sup>* , **X***<sup>n</sup> <sup>i</sup>*+1, **<sup>1</sup>***<sup>i</sup>* ,¬**X***<sup>n</sup> <sup>i</sup>*+1, **<sup>Y</sup>**)*. (b) If* F-(**X**,¬**X**, **<sup>Y</sup>**) *is in wDNNF, then* <sup>δ</sup>*<sup>i</sup>* <sup>=</sup> <sup>Δ</sup>*<sup>i</sup> and* <sup>γ</sup>*<sup>i</sup>* <sup>=</sup> <sup>Γ</sup>*<sup>i</sup> for every* <sup>i</sup> <sup>∈</sup> {1,...n}*.*

*Proof.* To prove part (a), we use induction on i. The base case corresponds to i = 1. Recall that <sup>∃</sup>**X**<sup>1</sup> <sup>1</sup>F(**X**, **<sup>Y</sup>**) <sup>⇔</sup> <sup>F</sup>-(1, **X***<sup>n</sup>* <sup>2</sup> , <sup>0</sup>,¬**X***<sup>n</sup>* <sup>2</sup> , **<sup>Y</sup>**)∨F(0, **<sup>X</sup>***<sup>n</sup>* <sup>2</sup> , <sup>1</sup>,¬**X***<sup>n</sup>* <sup>2</sup> , **Y**) by definition. Proposition <sup>3</sup> already asserts that <sup>∃</sup>**X**<sup>1</sup> <sup>1</sup>F(**X**, **<sup>Y</sup>**) <sup>⇒</sup> <sup>F</sup>-(1, **X***<sup>n</sup>* <sup>2</sup> , <sup>1</sup>,¬**X***<sup>n</sup>* <sup>2</sup> , **Y**). Therefore, if the condition in Theorem 2(a) holds for i = 1, we then have F-(1, **X***<sup>n</sup>* <sup>2</sup> , <sup>1</sup>,¬**X***<sup>n</sup>* 2 , **Y**) ⇔ F-(1, **X***<sup>n</sup>* <sup>2</sup> , <sup>0</sup>,¬**X***<sup>n</sup>* <sup>2</sup> , **<sup>Y</sup>**)∨F(0, **<sup>X</sup>***<sup>n</sup>* <sup>2</sup> , <sup>1</sup>,¬**X***<sup>n</sup>* <sup>2</sup> , **Y**), which in turn is equivalent to <sup>∃</sup>**X**<sup>1</sup> <sup>1</sup>F(**X**, **Y**). This proves the base case. 

Let us now assume (inductive hypothesis) that the statement of Theorem 2(a) holds for 1 <sup>≤</sup> i<n. We prove below that the same statement holds for <sup>i</sup> <sup>+</sup> 1 as well. Clearly, <sup>∃</sup>**X***i*+1 <sup>1</sup> <sup>F</sup>(**X**, **<sup>Y</sup>**) ⇔ ∃x*i*+1 <sup>∃</sup>**X***<sup>i</sup>* <sup>1</sup>F(**X**, **Y**) . By the inductive hypothesis, this is equivalent to <sup>∃</sup>x*i*+1F-(**1***<sup>i</sup>* , **X***<sup>n</sup> <sup>i</sup>*+1, **<sup>1</sup>***<sup>i</sup>* ,¬**X***<sup>n</sup> <sup>i</sup>*+1, **<sup>Y</sup>**). By definition of existential quantification, this is equivalent to <sup>F</sup>-(**1***i*+1, **X***<sup>n</sup> <sup>i</sup>*+2, **<sup>1</sup>***<sup>i</sup>* <sup>0</sup>,¬**X***<sup>n</sup> <sup>i</sup>*+2, **<sup>Y</sup>**)<sup>∨</sup> F-(**1***<sup>i</sup>* 0, **X***<sup>n</sup> <sup>i</sup>*+2, **<sup>1</sup>***i*+1,¬**X***<sup>n</sup> <sup>i</sup>*+2, **<sup>Y</sup>**). From the condition in Theorem 2(a), we also have F*<sup>i</sup>*+2, **<sup>Y</sup>**) <sup>⇒</sup> <sup>F</sup>-

$$\begin{split} \hat{F}(\mathbf{1}^{i+1}, \mathbf{X}\_{i+2}^{n}, \mathbf{1}^{i+1}, \mathbf{X}\_{i+2}^{n}, \mathbf{Y}) \Rightarrow & \hat{F}(\mathbf{1}^{i} 0, \mathbf{X}\_{i+2}^{n}, \mathbf{1}^{i+1}, \overline{\mathbf{X}}\_{i+2}^{n}, \mathbf{Y}) \\ & \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \widehat{F}(\mathbf{1}^{i+1}, \mathbf{X}\_{i+2}^{n}, \mathbf{1}^{i+1}, \overline{\mathbf{X}}\_{i+2}^{n}, \mathbf{Y}) \\ & \qquad \qquad \qquad \qquad \qquad \qquad \widehat{F}(\mathbf{1}^{i+1}, \mathbf{X}\_{i+2}^{n}, \mathbf{1}^{i}0, \overline{\mathbf{X}}\_{i+2}^{n}, \mathbf{Y}) \end{split}$$

The implication in the reverse direction follows from Proposition 2(a). Thus we have a bi-implication above, which we have already seen is equivalent to <sup>∃</sup>**X***<sup>i</sup>*+1 <sup>1</sup> F(**X**, **Y**). This proves the inductive case. To prove part (b), we first show that if <sup>F</sup>-

(**X**,¬**X**, **<sup>Y</sup>**) is in wDNNF, then the condition in Theorem 2(a) must hold for all <sup>j</sup> ∈ {1,...n}. Theorem 2(b) then follows from the definitions of Δ*<sup>i</sup>* and Γ*<sup>i</sup>* (see Sect. 2), from the statement of Theorem 2(a) and from the definitions of δ*<sup>i</sup>* and γ*<sup>i</sup>* (see Eq. 3). *<sup>j</sup>*+1, **<sup>Y</sup>**) denote the formula <sup>F</sup>-

For <sup>j</sup> ∈ {1,...n}, let <sup>ζ</sup>(**X***<sup>n</sup> <sup>j</sup>*+1, **<sup>X</sup>***<sup>n</sup>* (**1***<sup>j</sup>* , **X***<sup>n</sup> <sup>j</sup>*+1, **1***<sup>j</sup>* , **X***<sup>n</sup> <sup>j</sup>*+1, **<sup>Y</sup>**) ∧ ¬ F-(**1***<sup>j</sup>*−<sup>1</sup>0, **X***<sup>n</sup> <sup>j</sup>*+1, **<sup>1</sup>***<sup>j</sup>*−<sup>1</sup>1, **<sup>X</sup>***<sup>n</sup> <sup>j</sup>*+1, **<sup>Y</sup>**) <sup>∨</sup> <sup>F</sup>-(**1***<sup>j</sup>*−<sup>1</sup>1, **X***<sup>n</sup> <sup>j</sup>*+1, **<sup>1</sup>***<sup>j</sup>*−<sup>1</sup>0, **X***n <sup>j</sup>*+1, **Y**) . Suppose, if possible, <sup>F</sup>-(**X**,¬**X**, **<sup>Y</sup>**) is in wDNNF but there exists <sup>j</sup> (1 <sup>≤</sup> <sup>j</sup> <sup>≤</sup> <sup>n</sup>) such that <sup>ζ</sup>(**X***<sup>n</sup> <sup>j</sup>*+1, **<sup>X</sup>***<sup>n</sup> <sup>j</sup>*+1, **Y**) is satisfiable. Let **X***<sup>n</sup> <sup>j</sup>*+1 <sup>=</sup> <sup>σ</sup>, **<sup>X</sup>***<sup>n</sup> <sup>j</sup>*+1 = κ and **Y** = θ be a satisfying assignment of ζ. We now consider the simplified circuit obtained by substituting **1***<sup>j</sup>*−<sup>1</sup> for **X***<sup>j</sup>*−<sup>1</sup> <sup>1</sup> as well as for **<sup>X</sup>***<sup>j</sup>*−<sup>1</sup> <sup>1</sup> , σ for **X***<sup>n</sup> <sup>j</sup>*+1, <sup>κ</sup> for **X***n <sup>j</sup>*+1 and <sup>θ</sup> for **<sup>Y</sup>** in the AIG for <sup>F</sup>-. This simplification replaces the output of every internal node with a constant (0 or 1), if the node evaluates to a constant under the above assignment. Note that the resulting circuit can have only x*<sup>j</sup>* and x*<sup>j</sup>* as its inputs. Furthermore, since the assignment satisfies ζ, it follows that the simplified circuit evaluates to 1 if both x*<sup>j</sup>* and x*<sup>j</sup>* are set to 1, and it evaluates to 0 if any one of x*<sup>j</sup>* or x*<sup>j</sup>* is set to 0. This can only happen if there is a node labeled <sup>∧</sup> in the AIG representing <sup>F</sup>-(**X**,¬**X**, **<sup>Y</sup>**) with a path leading from the leaf labeled <sup>x</sup>*<sup>j</sup>* , and another path leading from the leaf labeled <sup>¬</sup>x*<sup>j</sup>* . This is a contradiction, since <sup>F</sup>-(**X**,¬**X**, **<sup>Y</sup>**) is in wDNNF. Therefore, there is no <sup>j</sup> ∈ {1,...n} such that the condition of Theorem 2(a) is violated.

In general, the candidate Skolem functions generated from the approximations discussed above may not always be correct. Indeed, the conditions discussed above are only sufficient, but not necessary, for the approximations to be exact. Hence, we need a separate check to see if our candidate Skolem functions are correct. To do this, we use an *error formula* ε**Ψ**(**X** , **<sup>X</sup>**, **<sup>Y</sup>**) <sup>≡</sup> <sup>F</sup>(**X** , **Y**)∧*n <sup>i</sup>*=1(x*<sup>i</sup>* <sup>↔</sup> <sup>ψ</sup>*i*)∧ ¬F(**X**, **<sup>Y</sup>**), as described in [23], and check its satisfiability. The correctness of this check depends on the following result from [23].

**Theorem 3 (**[23]**).** ε**<sup>Ψ</sup>** *is unsatisfiable iff* **Ψ** *is a correct Skolem function vector.*

#### **Algorithm 1.** bfss

**Input**: F-(**X**, **Y**) in NNF (or wDNNF) with inputs |**Y**| = m, outputs |**X**| = n, **Output**: Candidate Skolem Functions **<sup>Ψ</sup>** = (ψ1,...,ψ*n*) **<sup>1</sup> Initialize**: Fix sets U<sup>0</sup> = U<sup>1</sup> = ∅; **2 repeat <sup>3</sup>** // Repeatedly checks for Unate variables **<sup>4</sup> for** *each* <sup>x</sup>*i* <sup>∈</sup> **<sup>X</sup>** \ (U<sup>0</sup> <sup>∪</sup> <sup>U</sup>1) **do <sup>5</sup> if** <sup>F</sup> *is positive unate in* <sup>x</sup>*i* // check <sup>x</sup>*i* pure or <sup>η</sup><sup>+</sup> *i* (Eq 1) SAT *;* **6 then <sup>7</sup>** <sup>F</sup>- := F-[x*i* = 1], <sup>U</sup><sup>1</sup> <sup>=</sup> <sup>U</sup><sup>1</sup> ∪ {x*i*} **<sup>8</sup> else if** <sup>F</sup> *is negative unate in* <sup>x</sup>*i* // <sup>¬</sup>x*i* pure or <sup>η</sup><sup>−</sup> (Eq 2)SAT *;* **9 then <sup>10</sup>** <sup>F</sup>- := F-[x*i* = 0], <sup>U</sup><sup>0</sup> <sup>=</sup> <sup>U</sup><sup>0</sup> ∪ {x*i*} **<sup>11</sup> until** *F is unchanged* // No Unate variables remaining; **<sup>12</sup>** Choose an ordering of **<sup>X</sup>** // Section 6 discusses ordering used; **<sup>13</sup> for** *each* <sup>x</sup>*i* <sup>∈</sup> **<sup>X</sup>** *in order* **do <sup>14</sup> if** <sup>x</sup>*i* <sup>∈</sup> <sup>U</sup>*j for* <sup>j</sup> ∈ {0, <sup>1</sup>} // A*ssume* <sup>x</sup><sup>1</sup> <sup>x</sup><sup>2</sup> ...x*n;* **15 then <sup>16</sup>** <sup>ψ</sup>*i* <sup>=</sup> <sup>j</sup> **17 else <sup>18</sup>** <sup>ψ</sup>*i* is as defined in (Eq 3) **19 if** *error formula* **<sup>Ψ</sup>** *is UNSAT* **then 20** terminate and output **Ψ 21 else 22** call Phase 2

We now combine all the above ingredients to come up with algorithm bfss (for *Blazingly Fast Skolem Synthesis*), as shown in Algorithm 1. The algorithm can be divided into three parts. In the first part (lines 2-11), unateness is checked. This is done in two ways: (i) we identify pure literals in F by simply examining the labels of leaves in the DAG representation of F in NNF, and (ii) we check the satisfiability of the formulas η<sup>+</sup> *<sup>i</sup>* and <sup>η</sup><sup>−</sup> *<sup>i</sup>* , as defined in Eqs. 1 and 2. This requires invoking a SAT solver in the worst-case, and is repeated at most <sup>O</sup>(n<sup>2</sup>) times until there are no more unate variables. Hence this requires <sup>O</sup>(n<sup>2</sup>) calls to a SAT solver. Once we have done this, by Proposition 1, the constants 1 or 0 (for positive or negative unate variables respectively) are correct Skolem functions for these variables.

In the second part, we fix an ordering of the remaining output variables according to an experimentally sound heuristic, as described in Sect. 6, and compute candidate Skolem functions for these variables according to Eq. 3. We then check the satisfiability of the error formula **<sup>Ψ</sup>** to determine if the candidate Skolem functions are indeed correct. If the error formula is found to be unsatisfiable, we know from Theorem 3 that we have the correct Skolem functions, which can therefore be output. This concludes phase 1 of algorithm bfss. If the error formula is found to be satisfiable, we move to phase 2 of algorithm bfss – an adaptation of the CEGAR-based technique described in [23], and discussed briefly in Sect. 5. It is not difficult to see that the running time of phase 1 is polynomial in the size of the input, relative to an NP-oracle (SAT solver in practice). This also implies that the Skolem functions generated can be of at most polynomial size. Finally, from Theorem 2 we also obtain that if F satisfies Theorem 2(a), Skolem functions generated in phase 1 are correct. From the above reasoning, we obtain the following properties of phase 1 of bfss:


*Discussion:* We make two crucial and related observations. First, by our hardness results in Sect. 3, we know that the above algorithm cannot solve BFnS for all inputs, unless some well-regarded complexity-theoretic conjectures fail. As a result, we must go to phase 2 on at least some inputs. Surprisingly, our experiments show that this is not necessary in the majority of benchmarks.

The second observation tries to understand why phase 1 works in most cases in practice. While a conclusive explanation isn't easy, we believe Theorem 2 explains the success of phase 1 in several cases. By [15], we know that all Boolean functions have a DNNF (and hence wDNNF) representation, although it may take exponential time to compute this representation. This allows us to define two preprocessing procedures. In the first, we identify cases where we can directly convert to wDNNF and use the Phase 1 algorithm above. And in the second, we use several optimization scripts available in the ABC [26] library to optimize the AIG representation of <sup>F</sup>-. For a majority of benchmarks, this appears to yield a representation of <sup>F</sup> that allows the proof of Theorem 2(a) to go through. For the rest, we apply the Phase 2 algorithm as described below.

*Quantitative guarantees of "goodness".* Given our theoretical and practical insights of the applicability of phase 1 of bfss, it would be interesting to measure how much progress we have made in phase 1, even if it does not give the correct Skolem functions. One way to measure this "goodness" is to estimate the number of counterexamples as a fraction of the size of the input space. Specifically, given the error formula, we get an approximate count of the number of models for this formula *projected on the inputs* **Y**. This can be obtained efficiently in practice with high confidence using state-of-the-art approximate model counters, viz. [13], with complexity in BPPNP. The approximate count thus obtained, when divided by 2|**Y**<sup>|</sup> gives the fraction of input combinations for which the candidate Skolem functions output by phase 1 do not work correctly. We call this the *goodness ratio* of our approximation.

#### **5 Phase 2: Counterexample-Guided Refinement**

For phase 2, we can use any off-the-shelf worst-case exponential-time Skolem function generator. However, given that we already have candidate Skolem functions with guarantees on their "goodness", it is natural to use them as starting points for phase 2. Hence, we start off with candidate Skolem functions for all x*<sup>i</sup>* as computed in phase 1, and then update (or refine) them in a counterexampledriven manner. Intuitively, a counterexample is a value of the inputs **Y** for which there exists a value of **X** that renders F(**X**, **Y**) true, but for which F(**Ψ**, **Y**) evaluates to false. As shown in [23], given a candidate Skolem function vector, every satisfying assignment of the error formula ε**<sup>Ψ</sup>** gives a counterexample. The refinement step uses this satisfying assignment to update an appropriate subset of the approximate δ*<sup>i</sup>* and γ*<sup>i</sup>* functions computed in phase 1. The entire process is then repeated until no counterexamples can be found. The final updated vector of Skolem functions then gives a solution of the BFnS problem. Note that this idea is not new [3,23]. The only significant enhancement we do over the algorithm in [23] is to use an almost-uniform sampler [12] to efficiently sample the space of counterexamples almost uniformly. This allows us to do refinement with a diverse set of counterexamples, instead of using counterexamples in a corner of the solution space of ε**<sup>Ψ</sup>** that the SAT solver heuristics zoom down on.

#### **6 Experimental Results**

**Experimental methodology.** Our implementation consists of two parallel pipelines that accept the same input specification but represent them in two different ways. The first pipeline takes the input formula as an AIG and builds an NNF (not necessarily wDNNF) DAG, while the second pipeline builds an ROBDD from the input AIG using dynamic variable reordering (no restrictions on variable order), and then obtains a wDNNF representation from it using the linear-time algorithm described in [15]. Once the NNF/wDNNF representation is built, we use Algorithm 1 in Phase 1 and CEGAR-based synthesis using UniGen [12] to sample counterexamples in Phase 2. We call this ensemble of two pipelines as bfss. We compare bfss with the following algorithms/tools: (i) parSyn [3], (ii) Cadet [33], (iii) RSynth [38], and (iv) AbsSynthe-Skolem (based on the BFnS step of AbsSynthe [10]).

Our implementation of bfss uses the ABC [26] library to represent and manipulate Boolean functions. Two different SAT solvers can be used with bfss: ABC's default SAT solver, or UniGen [12] (to give almost-uniformly distributed counterexamples). All our experiments use UniGen.

We consider a total of 504 benchmarks, taken from four different domains: (a) forty-eight *Arithmetic benchmarks* from [17], with varying bit-widths (viz. 32, 64, 128, 256, 512 and 1024) of arithmetic operators, (b) sixty-eight *Disjunctive Decomposition benchmarks* from [3], generated by considering some of the larger sequential circuits in the HWMCC10 benchmark suite, (c) five *Factorization benchmarks*, also from [3], representing factorization of numbers of different bit-widths (8, 10, 12, 14, 16), and (d) three hundred and eighty three *QBFEval benchmarks*, taken from the Prenex 2QBF track of QBFEval 2017 [32] <sup>2</sup>. Since different tools accept benchmarks in different formats, each benchmark was converted to both qdimacs and verilog/aiger formats. All benchmarks and the procedure by which we generated (and converted) them are detailed in [1]. Recall that we use two pipelines for bfss. We use "balance; rewrite -l; refactor -l; balance; rewrite -l; rewrite -lz; balance; refactor -lz; rewrite -lz; balance" as the ABC script for optimizing the AIG representation of the input specification. We observed that while this results in only 4 benchmarks being in wDNNF in the first pipeline, 219 benchmarks were solved in Phase 1 using this pipeline. This is attributable to specifications being unate in several output variables, and also satisfying the condition of Theorem 2(a) (while not being in wDNNF). In the second pipeline, however, we could represent 230 benchmarks in wDNNF, and all of these were solved in Phase 1.

For each benchmark, the order (ref. step 12 of Algorithm 1) in which Skolem functions are generated is such that the variable which occurs in the transitive fan-in of the least number of nodes in the AIG representation of the specification is ordered before other variables. This order () is used for both bfss and parSyn. Note that the order is completely independent of the dynamic variable order used to construct an ROBDD of the input specification in the second pipeline, prior to getting the wDNNF representation.

All experiments were performed on a message-passing cluster, with 20 cores and 64 GB memory per node, each core being a 2.2 GHz Intel Xeon processor. The operating system was Cent OS 6.5. Twenty cores were assigned to each run of parSyn. For RSynth and Cadet a single core on the cluster was used, since these tools don't exploit parallel processing. Each pipeline of bfss was executed on a single node; the computation of candidate functions, building of error formula and refinement of the counterexamples was performed sequentially on 1 thread, and UniGen had 19 threads at its disposal (idle during Phase 1).

The maximum time given for execution of any run was 3600 s. The total amount of main memory for any run was restricted to 16GB. The metric used to compare the algorithms was *time taken to synthesize Boolean functions*. The time reported for bfss is the better of the two times obtained from the alternative pipelines described above. Detailed results from the individual pipelines are available in [2].

**Results.** Of the 504 benchmarks, 177 benchmarks were not solved by any tool – 6 of these being from arithmetic benchmarks and 171 from QBFEval.

Table 1 gives a summary of the performance of bfss (considering the combined pipelines) over different benchmarks suites. Of the 504 benchmarks, bfss

<sup>2</sup> The track contains 384 benchmarks, but we were unsuccessful in converting 1 benchmark to some of the formats required by the various tools.


**Table 1.** bfss: Performance summary of combined pipelines

was successful on 278 benchmarks; of these, 170 are from QBFEval, 68 from Disjunctive Decomposition, 35 from Arithmetic and 5 from Factorization.

Of the 383 benchmarks in the QBFEval suite, we ran bfss only on 254 since we could not build succinct AIGs for the remaining benchmarks. Of these, 159 *benchmarks were solved by Phase 1 (i.e., 62% of built QBFEval benchmarks)* and 73 proceeded to Phase 2, of which 11 reached completion. On another 11 QBFEval benchmarks Phase 1 timed out. Of the 48 Arithmetic benchmarks, *Phase 1 successfully solved* 35 *(i.e.,* ∼ 72%*)* and Phase 2 was started for 8 benchmarks; Phase 1 timed out on 5 benchmarks. Of the 68 Disjunctive Decomposition benchmarks, *Phase 1 successfully solved* 66 *benchmarks (i.e., 97%)*, and Phase 2 was started and reached completion for 2 benchmarks. For the 5 Factorization benchmarks, Phase 1 was successful on all 5 benchmarks.

Recall that the goodness ratio is the ratio of the number of *counterexamples remaining* to the *total size of the input space* after Phase 1. For all benchmarks solved by Phase 1, the goodness ratio is 0. We analyzed the goodness ratio at the beginning of Phase 2 for 83 benchmarks for which Phase 2 started. For 13 benchmarks this ratio was small (< 0.002), and Phase 2 reached completion for these. Of the remaining benchmarks, 34 also had a small goodness ratio (< 0.1), indicating that we were close to the solution at the time of timeout. However, 27 benchmarks in QBFEval had goodness ratio close to > 0.9, indicating that most of the counter-examples were not eliminated by timeout.

We next compare the performance of bfss with other state-of-art tools. For clarity, since the number of benchmarks in the QBFEval suite is considerably greater, we plot the QBFEval benchmarks separately.

bfss vs Cadet: Of the 504 benchmarks, Cadet was successful on 231 benchmarks, of which 24 belonged to Disjunctive Decomposition, 22 to Arithmetic, 1 to Factorization and 184 to QBFEval. Figure 1(a) gives the performance of the two algorithms with respect to time on the QBFEval suite. Here, Cadet solved 35 benchmarks that bfss could not solve, whereas bfss solved 21 benchmarks that could not be solved by Cadet. Figure 1(b) gives the performance of the two algorithms with respect to time on the Arithmetic, Factorization and Disjunctive Decomposition benchmarks. In these categories, there were a total of 62 benchmarks that bfss solved that Cadet could not solve, and there was 1 benchmark that Cadet solved but bfss did not solve. While Cadet takes less time on Arithmetic benchmarks and many QBFEval benchmarks, on Disjunctive Decomposition and Factorization, bfss takes less time.

**Fig. 1.** bfss vs Cadet: Legend: Q: QBFEval, A: Arithmetic, F: Factorization, D: Disjunctive Decomposition. TO: benchmarks for which the corresponding algorithm was unsuccessful.

**Fig. 2.** bfss vs parSyn (for legend see Fig. 1)

bfss vs parSyn: Fig. 2 shows the comparison of time taken by bfss and parSyn. parSyn was successful on a total of 185 benchmarks, and could solve 1 benchmark which bfss could not solve. On the other hand, bfss solved 94 benchmarks that parSyn could not solve. From Fig. 2, we can see that on most of the Arithmetic, Disjunctive Decomposition and Factorization benchmarks, bfss takes less time than parSyn.

bfss vs RSynth: We next compare the performance of bfss with RSynth. As shown in Fig. 3, RSynth was successful on 51 benchmarks, with 4 benchmarks that could be solved by RSynth but not by bfss. In contrast, bfss could solve 231 benchmarks that RSynth could not solve! Of the benchmarks that were solved by both solvers, we can see that bfss took less time on most of them.

bfss vs AbsSynthe-Skolem: AbsSynthe-Skolem was successful on 217 benchmarks, and could solve 31 benchmarks that bfss could not solve. In contrast, bfss solved a total of 92 benchmarks that AbsSynthe-Skolem could not. Figure 4 shows a comparison of running times of bfss and AbsSynthe-Skolem.

**Fig. 3.** bfss vs RSynth (for legend see Fig. 1)

**Fig. 4.** bfss vs AbsSynthe-Skolem (for legend see Fig. 1)

### **7 Conclusion**

In this paper, we showed some complexity-theoretic hardness results for the Boolean functional synthesis problem. We then developed a two-phase approach to solve this problem, where the first phase, which is an efficient algorithm generating poly-sized functions surprisingly succeeds in solving a large number of benchmarks. To explain this, we identified sufficient conditions when phase 1 gives the correct answer. For the remaining benchmarks, we employed the second phase of the algorithm that uses a CEGAR-based approach and builds Skolem functions by exploiting recent advances in SAT solvers/approximate counters. As future work, we wish to explore further improvements in Phase 2, and other structural restrictions on the input that ensure completeness of Phase 1.

**Acknowledgements.** We are thankful to Ajith John, Kuldeep Meel, Mate Soos, Ocan Sankur, Lucas Martinelli Tabajara and Markus Rabe for useful discussions and for providing us with various software tools used in the experimental comparisons. We also thank the anonymous reviewers for insightful comments.

#### **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

### **Counterexample Guided Inductive Synthesis Modulo Theories**

Alessandro Abate<sup>1</sup> , Cristina David2,3(B) , Pascal Kesseli<sup>3</sup> , Daniel Kroening1,3 , and Elizabeth Polgreen<sup>1</sup>

> University of Oxford, Oxford, UK University of Cambridge, Cambridge, UK cd652@cam.ac.uk Diffblue Ltd., Oxford, UK

**Abstract.** Program synthesis is the mechanised construction of software. One of the main difficulties is the efficient exploration of the very large solution space, and tools often require a user-provided syntactic restriction of the search space. We propose a new approach to program synthesis that combines the strengths of a counterexample-guided inductive synthesizer with those of a theory solver, exploring the solution space more efficiently without relying on user guidance. We call this approach CEGIS(T ), where T is a first-order theory. In this paper, we focus on one particular challenge for program synthesizers, namely the generation of programs that require non-trivial constants. This is a fundamentally difficult task for state-of-the-art synthesizers. We present two exemplars, one based on Fourier-Motzkin (FM) variable elimination and one based on first-order satisfiability. We demonstrate the practical value of CEGIS(T ) by automatically synthesizing programs for a set of intricate benchmarks.

#### **1 Introduction**

Program synthesis is the problem of finding a program that meets a correctness specification given as a logical formula. This is an active area of research in which substantial progress has been made in recent years.

In full generality, program synthesis is an exceptionally difficult problem, and thus, the research community has explored pragmatic restrictions. One particularly successful direction is *Syntax-Guided Program Synthesis* (SyGuS) [2]. The key idea of SyGuS is that the user supplements the logical specification with a syntactic template for the solution. Leveraging the user's intuition, SyGuS reduces the solution space size substantially, resulting in significant speed-ups.

Unfortunately, it is difficult to provide the syntactic template in many practical applications. A very obvious exemplar of the limits of the syntax-guided approach are programs that require non-trivial constants. In such a scenario, the

Supported by ERC project 280053 (CPROVER) and the H2020 FET OPEN 712689 SC<sup>2</sup>. Cristina David is supported by the Royal Society University Research Fellowship UF160079.

c The Author(s) 2018

H. Chockler and G. Weissenbacher (Eds.): CAV 2018, LNCS 10981, pp. 270–288, 2018. https://doi.org/10.1007/978-3-319-96145-3\_15

syntax-guided approach requires that the user provides the exact value of the constants in the solution.

For illustration, let's consider a user who wants to synthesize a program that rounds up a given 32-bit unsigned number x to the next highest power of two. If we denote the function computed by the program by f(x), then the specification can be written as x < <sup>2</sup><sup>31</sup>⇒f(x)&(−f(x)) = <sup>f</sup>(x) <sup>∧</sup> <sup>f</sup>(x) <sup>≥</sup> <sup>x</sup> <sup>∧</sup> <sup>2</sup><sup>x</sup> <sup>≥</sup> <sup>f</sup>(x). The first conjunct forces f(x) to be a power of two, the other requires it to be the next highest. A possible solution for this is given by the following C program:

```
1 x=x−1;
2 x |= x >> 1 ;
3 x |= x >> 2 ;
4 x |= x >> 4 ;
5 x |= x >> 8 ;
6 x |= x >> 16;
7 x=x +1;
```
It is improbable that the user knows that the constants in the solution are exactly 1, 2, 4, 8, 16, and thus, she will be unable to explicitly restrict the solution space. As a result, synthesizers are very likely to enumerate possible combinations of constants, which is highly inefficient.

In this paper we propose a new approach to program synthesis that combines the strengths of a counterexample-guided inductive synthesizer with those of a solver for a first-order theory in order to perform a more efficient exploration of the solution space, without relying on user guidance. Our inspiration for this proposal is DPLL(T ), which has boosted the performance of solvers for many fragments of quantifier-free first-order logic [16,23]. DPLL(T ) combines reasoning about the Boolean structure of a formula with reasoning about theory facts to decide satisfiability of a given formula.

In an attempt to generate similar technological advancements in program synthesis, we propose a new algorithm for program synthesis called CounterExample-Guided Inductive Synthesis(T ), where T is a given first-order theory for which we have a specialised solver. Similar to its counterpart DPLL(T ), the CEGIS(T ) architecture features communication between a synthesizer and a theory solver, which results in a much more efficient exploration of the search space.

While standard CEGIS architectures [19,30] already make use of SMT solvers, the typical role of such a solver is restricted to validating candidate solutions and providing concrete counterexamples that direct subsequent search. By contrast, CEGIS(T ) allows the theory solver to communicate generalised constraints back to the synthesizer, thus enabling more significant pruning of the search space.

There are instances of more sophisticated collaboration between a program synthesizer and theory solvers. The most obvious such instance is the program synthesizer inside the CVC4 SMT solver [27]. This approach features a very tight coupling between the two components (i.e., the synthesizer and the theory solvers) that takes advantage of the particular strengths of the SMT solver by reformulating the synthesis problem as the problem of refuting a universally quantified formula (SMT solvers are better at refuting universally quantified formulae than at proving them). Conversely, in our approach we maintain a clear separation between the synthesizer and the theory solver while performing comprehensive and well-defined communication between the two components. This enables the flexible combination of CEGIS with a variety of theory solvers, which excel at exploring different solution spaces.

#### **Contributions**


### **2 Preliminaries**

#### **2.1 The Program Synthesis Problem**

Program synthesis is the task of automatically generating programs that satisfy a given logical specification. A program synthesizer can be viewed as a solver for existential second-order logic. An existential second-order logic formula allows quantification over functions as well as ground terms [28].

The input specification provided to a program synthesizer is of the form ∃P. ∀*x*. σ(P, *x*), where P ranges over functions (where a function is represented by the program computing it), *x* ranges over ground terms, and σ is a quantifierfree formula.

#### **2.2 CounterExample Guided Inductive Synthesis**

CounterExample-Guided Inductive Synthesis (CEGIS) is a popular approach to program synthesis, and is an iterative process. Each iteration performs inductive generalisation based on counterexamples provided by a verification oracle. Essentially, the inductive generalisation uses information about a limited number of inputs to make claims about all the possible inputs in the form of candidate solutions.

The CEGIS framework is illustrated in Fig. 1 and consists of two phases: the synthesis phase and the verification phase. Given the specification of the desired program, σ, the inductive synthesis procedure generates a candidate program P<sup>∗</sup> that satisfies σ(P∗, *x*) for a subset *xinputs* of all possible inputs. The candidate program P<sup>∗</sup> is passed to the verification phase, which checks whether

**Fig. 1.** CEGIS block diagram

it satisfies the specification σ(P∗, *x*) for all possible inputs. This is done by checking whether ¬σ(P∗, *x*) is unsatisfiable. If so, ∀x.¬σ(P∗, *x*) is valid, and we have successfully synthesized a solution and the algorithm terminates. Otherwise, the verifier produces a counterexample *c* from the satisfying assignment, which is then added to the set of inputs passed to the synthesizer, and the loop repeats.

The method used in the synthesis and verification blocks varies in different CEGIS implementations; our CEGIS implementation uses Bounded Model Checking [8].

#### **2.3 DPLL(***T* **)**

DPLL(T ) is an extension of the DPLL algorithm, used by most propositional SAT solvers, by a theory T . We give a brief overview of DPLL(T ) and compare DPLL(T ) with CEGIS(T ).

Given a formula F from a theory T , a propositional formula F*<sup>p</sup>* is created from F in which the theory atoms are replaced by Boolean variables (the "propositional skeleton"). The standard DPLL algorithm, comprising Decide, Boolean Constraint Propagation (BCP), Analyze-Conflict and BackTrack, generates an assignment to the Boolean variables in F*p*, as illustrated in Fig. 2. The theory solver then checks whether this assignment is still consistent when the Boolean variables are replaced by their original atoms. If so, a satisfying assignment for F has been found. Otherwise, a constraint over the Boolean variables in F*<sup>p</sup>* is passed back to Decide, and the process repeats.

In the very first SMT solvers, a full assignment to the Boolean variables was obtained, and then the theory solver returned only a single counterexample, similar to the implementations of CEGIS that are standard now. Such SMT solvers are prone to enumerating all possible counterexamples, and so the key improvement in DPLL(T ) was the ability to pass back a more general constraint over the variables in the formula as a counterexample [16]. Furthermore, modern variants of DPLL(T ) call the theory solver on partial assignments to the variables in F*p*. Our proposed, new synthesis algorithm offers equivalents of both of these ideas that have improved DPLL(T ).

**Fig. 2.** DPLL(T ) with theory propagation

#### **3 Motivating Example**

In each iteration of a standard CEGIS loop, the communication from the verification phase back to the synthesis phase is restricted to concrete counterexamples. This is particularly detrimental when synthesizing programs that require nontrivial constants. In such a setting, it is typical that a counterexample provided by the verification phase only eliminates a single candidate solution and, consequently, the synthesizer ends up enumerating possible constants.

For illustration, let's consider the trivial problem of synthesizing a function f(x) where f(x) < 0 if x < 334455 and f(x) = 0, otherwise. One possible solution is f(x) = *ite* (x < 334455)−10, where *ite* stands for *if then else*.

In order to make the synthesis task even simpler, we are going to assume that we know a part of this solution, namely we know that it must be of the form f(x) = *ite* (x < ?) −1 0, where "?" is a placeholder for the missing constant that we must synthesize. A plausible scenario for a run of CEGIS is presented next: the synthesis phase guesses f(x) = *ite* (x < 0) −1 0, for which the verification phase returns x = 0 as a counterexample. In the next iteration of the CEGIS loop, the synthesis phase guesses f(x) = *ite*(x < 1)−1 0 (which works for x = 0) and the verifier produces x = 1 as a counterexample. Following the same pattern, the synthesis phase will enumerate all the candidates

$$f(x) = 
ite \ (x < 2) \ -1 \ 0$$

$$\cdots$$

$$f(x) = 
ite \ (x < 334454) \ -1 \ 0$$

before finding the solution. This is caused by the fact that each of the concrete counterexamples 0,..., 334454 eliminate one candidate only from the solution space. Consequently, we need to propagate more information from the verifier to the synthesis phase in each iteration of the CEGIS loop.

*Proving Properties of Programs.* Synthesis engines can be used as reasoning engines in program analysers, and constants are important for this application. For illustration, let's consider the very simple program below, which increments a variable x from 0 to 100000 and asserts that its value is less than 100005 on exit from the loop.

```
1 int x=0;
```
<sup>2</sup> **while** ( x<=100000) x++;

```
3 assert (x <100005);
```
Proving the safety of such a program, i.e., that the assertion at line 3 is not violated in any execution of the program, is a task well-suited for synthesis (the Syntax Guided Synthesis Competition [5] has a track dedicated to synthesizing safety invariants). For this example, a safety invariant is x < 100002, which holds on entrance to the loop, is inductive with respect to the loop's body, and implies the assertion on exit from the loop.

While it is very easy for a human to deduce this invariant, the need for a nontrivial constant makes it surprisingly difficult for state-of-the-art synthesizers: both CVC4 (version 1.5) [27] and EUSolver (version 2017-06-15) [3] fail to find a solution in an hour.

### **4 CEGIS(***T* **)**

#### **4.1 Overview**

In this section, we describe the architecture of CEGIS(T ), which is obtained by augmenting the standard CEGIS loop with a theory solver. As we are particularly interested in the synthesis of programs with constants, we present CEGIS(T ) from this particular perspective. In such a setting, CEGIS is responsible for synthesizing program skeletons, whereas the theory solver generates constraints over the literals that denote constants. These constraints are then propagated back to the synthesizer.

In order to explain the main ideas behind CEGIS(T ) in more detail, we first differentiate between a candidate solution, a candidate solution skeleton, a generalised candidate solution and a final solution.

**Definition 1 (Candidate solution).** *Using the notation in Sect. 2.2, a program P is a* candidate solution *if* ∀*xinputs* .σ(P, *xinputs* ) *is true for some subset xinputs of all possible x.*

**Definition 2 (Candidate solution skeleton).** *Given a candidate solution* P*, the* skeleton *of* P*, denoted by* P[?]*, is obtained by replacing each constant in* P *with a hole.*

**Fig. 3.** CEGIS(T )

**Definition 3 (Generalised candidate solution).** *Given a candidate solution skeleton* P[?]*, we obtain a* generalised candidate P[*v*] *by filling each hole in* P[?] *with a distinct symbolic variable, i.e., variable* v*<sup>i</sup> will correspond to the* i*-th hole. Then v* = [v1,...,v*n*]*, where* n *denotes the number of holes in* P[?]*.*

**Definition 4 (Final solution).** *A candidate solution P is a* final solution *if the formula* ∀*x*.σ(P, *x*) *is valid.*

*Example 1 (Candidate solution, candidate solution skeleton, generalised candidate solution, final solution).* Given the example in Sect. 3, if *xinputs* = {0}, then f(x) = −2 is a candidate solution. The corresponding candidate skeleton is f[?](x) = ? and the generalised candidate is f[v1](x) = v1. A final solution for this example is f(x) = *ite* (x < 334455) −1 0.

The communication between the synthesizer and the theory solver in CEGIS(T ) is illustrated in Fig. 3 and can be described as follows:


The CEGIS(T ) algorithm is given as Algorithm 1 and proceeds as follows:


#### **4.2 CEGIS(***T* **) with a Theory Solver Based on FM Elimination**

In this section we describe a theory solver based on FM variable elimination. Other techniques for eliminating existentially quantified variables can be used. For instance, one might use cylindrical algebraic decomposition [9] for specifications with non-linear arithmetic. In our case, whenever the specification σ does not belong to linear arithmetic, the FM theory solver is not called.

As mentioned above, we need to produce a constraint over variables *v* describing the situation when P∗[*v*] is a final solution. For this purpose, we consider the formula ∃*x*.¬σ(P∗[*v*], *x*), where *v* is a satisfiability witness if the specification σ admits a counterexample *x* for P∗. Let E(*v*) be the formula obtained by eliminating *x* from ∃*x*.¬σ(P∗[*v*], *x*). If ¬E(*v*) is satisfiable, any satisfiability witness gives us the necessary valuation for *v*:

$$C(P, P^\*, v) = \bigwedge\_{i=1\cdot n} v\_i = c\_i.$$

```
Algorithm 1. CEGIS(T )
```


If ¬E(*v*) is UNSAT, then the current skeleton P∗[?] needs to be blocked. This reasoning is supported by Lemma 1 and Corollary 1.

**Lemma 1.** *Let* E(*v*) *be the formula that is obtained by eliminating x from* <sup>∃</sup>*x*.¬σ(P∗[*v*], *<sup>x</sup>*)*. Then, any witness <sup>v</sup>*# *to the satisfiability of* <sup>¬</sup>E(*v*) *gives us a final solution* P∗[*v*#] *to the synthesis problem.*

*Proof.* From the fact that E(*v*) is obtained by eliminating *x* from ∃*x*.¬σ(P∗[*v*], *x*), we get that E(*v*) is equivalent with ∃*x*.¬σ(P∗[*v*], *x*) (we use ≡ to denote equivalence):

$$E(v) \equiv \exists x. \ \neg \sigma(P^\*[v], x).$$

Then:

$$
\neg E(v) \equiv \forall x. \; \sigma(P^\*[v], x) .
$$

Consequently, any *<sup>v</sup>* # satisfying <sup>¬</sup>E(*v*) also satisfies <sup>∀</sup>*x*. σ(P∗[*v*], *<sup>x</sup>*). From <sup>∀</sup>*x*. σ(P∗[*<sup>v</sup>* #], *<sup>x</sup>*) and Definition <sup>4</sup> we get that <sup>P</sup>∗[*<sup>v</sup>* #] is a final solution.

**Corollary 1.** *Let* E(v) *be the formula that is obtained by eliminating x from* ∃*x*.¬σ(P∗[*v*], *x*)*. If* ¬E(*v*) *is unsatisfiable, then the corresponding synthesis problem does not admit a solution for the skeleton* P∗[?]*.*

*Proof.* Given that ¬E(*v*) ≡ ∀*x*. σ(P∗[*v*], *x*), if ¬E(*v*) is unsatisfiable, so is ∀*x*. σ(P∗[*v*], *x*), meaning that there is no valuation for *v* such that the specification σ is obeyed for all inputs *x*.

For the current skeleton P∗[?], the constraint E(*v*) generalises the concrete counterexample *cex* (found during the CEGIS verification phase) in the sense that the instantiation *v* # of *v* for which *cex* failed the specification, i.e., <sup>¬</sup>σ(P∗[*<sup>v</sup>* #], *cex*), is a satisfiability witness for <sup>E</sup>(*v*). This is true as <sup>E</sup>(*v*) ≡ ∃*x*.¬σ(P∗[*v*], *<sup>x</sup>*), which means that the satisfiability witness (*<sup>v</sup>* #, *cex*) for ¬σ(P∗[*v*], *x*) projected on *v* is a satisfiability witness for E(*v*).

**Disjunction.** The specification σ and the candidate solution may contain disjunctions. However, most theory solvers (and in particular the FM variable elimination [7]) work on conjunctive fragments only. A na¨ıve approach could use case-splitting, i.e., transforming the formula into Disjunctive Normal Form (DNF) and then solving each clause separately. This can result in a number of clauses exponential in the size of the original formula. Instead, we handle disjunction using the Boolean Fourier Motzkin procedure [20,32]. As a result, the constraints we generate may be non-clausal.

**Applying CEGIS(***T* **) with FM to the Motivational Example.** We recall the example in Sect. 3 and apply CEGIS(T ). The problem is

$$\exists f. \forall x. \, x < 334455 \to f(x) < 0 \land x \ge 334455 \to f(x) = 0$$

which gives us the following specification:

$$
\sigma(f, x) = (x \ge 334455 \lor f(x) < 0) \land (x < 334455 \lor f(x) = 0).
$$

The first synthesis phase generates the candidate f <sup>∗</sup>(x) = 0 for which the verification phase returns the concrete counterexample x = 0. As this candidate contains the constant 0, we generalise it to f <sup>∗</sup>[v1](x) = v1, for which we get

$$
\sigma(f^\*[v\_1], x) = (x \ge 334455 \lor v\_1 < 0) \land (x < 334455 \lor v\_1 = 0).
$$

Next, we use FM to eliminate x from

$$\exists x. \neg(\sigma(f^\*[v\_1], x)) = \exists x. (x < 334455 \land v\_1 \ge 0) \lor (x \ge 334455 \land v\_1 \ne 0).$$

Note that, given that formula ¬σ(f <sup>∗</sup>[v1], x) is in DNF, for convenience we directly apply FM to each disjunct and obtain E(v1) = v<sup>1</sup> ≥ 0 ∨ v<sup>1</sup> = 0, which characterises all the values of v<sup>1</sup> for which there exists a counterexample. When negating E(v1) we get v<sup>1</sup> < 0 ∧ v<sup>1</sup> = 0, which is UNSAT. As there is no valuation of

v<sup>1</sup> for which the current f <sup>∗</sup> is a final solution, the result returned by the theory solver is (*false*, f[?] = f <sup>∗</sup>[?]), which is used to augment the specification. Subsequently, a new CEGIS(T ) iteration starts. The learning phase has changed the specification σ to

$$
\sigma(f, x) = \left(x \ge 334455 \lor f(x) < 0\right) \land \left(x < 334455 \lor f(x) = 0\right) \land f[?] \neq ?.
$$

This forces the synthesis phase to pick a new candidate solution with a different skeleton. The new candidate solution we get is f <sup>∗</sup>(x) = *ite* (x < 100) −3 1, which works for the previous counterexample x = 0. However, the verification phase returns the counterexample x = 100. Again, this candidate contains constants which we replace by symbolic variables, obtaining

$$f^\*[v\_1, v\_2, v\_3](x) = i te \ (x < v\_1) \ v\_2 \ v\_3 \dots$$

Next, we use FM to eliminate x from

$$\begin{aligned} &\exists x.\neg(\sigma(f^\*[v\_1, v\_2, v\_3], x)) = \\ &\exists x.\neg(x \ge 334455 \lor (x < v\_1 \to v\_2 < 0 \land x \ge v\_1 \to v\_3 < 0) \land \\ &\quad x < 334455 \lor (x < v\_1 \to v\_2 = 0 \land x \ge v\_1 \to v\_3 = 0)) = \\ &\exists x.\neg((x \ge 334455 \lor x \ge v\_1 \lor v\_2 < 0) \land (x \ge 334455 \lor x < v\_1 \lor v\_3 < 0) \land \\ &\quad (x < 334455 \lor x \ge v\_1 \lor v\_2 = 0) \land (x < 334455 \lor x < v\_1 \lor v\_3 = 0)) = \\ &\exists x.(x < 334455 \land x < v\_1 \land v\_2 \ge 0) \lor (x < 334455 \land x \ge v\_1 \land v\_3 \ge 0) \lor \\ &\quad (x \ge 334455 \land x < v\_1 \land v\_2 \ne 0) \lor (x \ge 334455 \land x \ge v\_1 \land v\_3 \ne 0). \end{aligned}$$

As we work with integers, we can rewrite x < 334455 to x ≤ 334454 and x < v<sup>1</sup> to x ≤ v<sup>1</sup> − 1. Then, we obtain the following constraint E(v1, v2, v3) (as aforementioned, we applied FM to each disjunct in ¬σ(f <sup>∗</sup>[v1, v2, v3], x))

$$E(v\_1, v\_2, v\_3) = v\_2 \ge 0 \lor (v\_1 \le 334454 \land v\_3 \ge 0) \lor (v\_1 \ge 334456 \land v\_2 \ne 0) \lor v\_3 \ne 0$$

whose negation is

$$\neg E(v\_1, v\_2, v\_3) = v\_2 < 0 \land \left(v\_1 > 334454 \lor v\_3 < 0\right) \land \left(v\_1 < 334456 \lor v\_2 = 0\right) \land v\_3 = 0$$

A satisfiability witness is v<sup>1</sup> = 334455, v<sup>2</sup> = −1 and v<sup>3</sup> = 0. Thus, the result returned by the theory solver is (*true*, v<sup>1</sup> = 334455 ∧ v<sup>2</sup> = −1∧ v<sup>3</sup> = 0), which is used by CEGIS to obtain the final solution

 $f^\*(x) = 
ite$   $(x < 334455) 
\ -1 
 \ 0 
 \dots$ 

#### **4.3 CEGIS(***T* **) with an SMT-based Theory Solver**

For our second variant of a theory solver, we make use of an off-the-shelf SMT solver that supports quantified first-order formulae. This approach is more generic than the one described in Sect. 4.2, as there are solvers for a broad range of theories.

Recall that our goal is to obtain a constraint C(P, P∗, *v*) that either characterises the valuations of *v* for which P∗[*v*] is a final solution or blocks P∗[?] whenever no such valuation exists. Consequently, we use the SMT solver to check the satisfiability of the formula

$$
\Phi = \forall x. \,\sigma(P^\*[v], x).
$$

If Φ is satisfiable, then any satisfiability witness *c* gives us a valuation for *v* such that P<sup>∗</sup> is a final solution: C(P, P∗, *v*) = - *<sup>i</sup>*=1·*<sup>n</sup>* <sup>v</sup>*<sup>i</sup>* <sup>=</sup> <sup>c</sup>*i*. Conversely, if Φ is unsatisfiable then C(P, P∗, *v*) must block the current skeleton P∗[?]: C(P, P∗, *v*) = P[?] = P∗[?].

**Applying SMT-based CEGIS(***T* **) to the Motivational Example.** Again, we recall the example in Sect. 3. We will solve it by using SMT-based CEGIS(T ) for the theory of linear arithmetic. For this purpose, we assume that the synthesis phase finds the same sequence of candidate solutions as in Sect. 3. Namely, the first candidate is f <sup>∗</sup>(x) = 0, which gets generalised to f <sup>∗</sup>[v1](x) = v1. Then, the first SMT call is for ∀x. σ(v1, x), where

$$
\sigma(v\_1, x) = (x \ge 334455 \lor v\_1 < 0) \land (x < 334455 \lor v\_1 = 0).
$$

The SMT solver returns UNSAT, which means that C(f,f <sup>∗</sup>, v1) = f[?] = ?. The second candidate is f <sup>∗</sup>(x) = *ite* (x < 100) − 3 1, which generalises to f <sup>∗</sup>[v1, v2, v3](x) = *ite* (x<v1) v<sup>2</sup> v3. The corresponding call to the SMT solver is for ∀x. σ((*ite* (x<v1) v<sup>2</sup> v3), x), for which we obtain the satisfiability witness v<sup>1</sup> = 334455, v<sup>2</sup> = −1 and v<sup>3</sup> = 0. Then C(f,f <sup>∗</sup>, v1, v2, v3) = v<sup>1</sup> = 334455∧v<sup>2</sup> = −1 ∧ v<sup>3</sup> = 0, which gives us the same final solution we obtained when using FM in Sect. 3.

#### **5 Experimental Evaluation**

#### **5.1 Implementation**

*Incremental Satisfiability Solving.* Our implementation of CEGIS may sometimes perform hundreds of loop iterations before finding the correct solution. Recall that the synthesis block of CEGIS is based on Bounded Model Checking (BMC). Ultimately, this BMC module performs calls to a SAT solver. Consequently, we may have hundreds of calls to this SAT solver, which are all very similar (the same base specification with some extra constraints added in each iteration). This makes CEGIS a prime candidate for incremental SAT solving. We implemented incremental solving in the synthesis block of CEGIS.

#### **5.2 Benchmarks**

We have selected a set of bitvector benchmarks from the Syntax-Guided Synthesis (SyGuS) competition [4] and a set of benchmarks synthesizing safety invariants and danger invariants for C programs [10]. All benchmarks are written in SyGuS-IF [26], a variant of SMT-LIB2.

Given that the syntactic restrictions (called the *grammar* or the *template*) provided in the SyGuS benchmarks contain all the necessary non-trivial constants, we removed them completely from these benchmarks. Removing just the non-trivial constants and keeping the rest of the grammar (with the only constants being 0 and 1) would have made the problem much more difficult, as the constants would have had to be incrementally constructed by applying the operators available to 0 and 1.

We group the benchmarks into three categories: invariant generation, which covers danger invariants, safety invariants and the class of invariant generation benchmarks from the SyGuS competition; hackers/crypto, which includes benchmarks from hackers-delight and cryptographic circuits; and comparisons, composed of benchmarks that require synthesizing longer programs with comparisons, e.g., finding the maximum value of 10 variables.

#### **5.3 Experimental Setup**

We conduct the experimental evaluation on a 12-core 2.40 GHz Intel Xeon E5- 2440 with 96 GB of RAM and Linux OS. We use the Linux *times* command to measure CPU time used for each benchmark. The runtime is limited to 600 s per benchmark. We use MiniSat [12] as the SAT solver, and Z3 v4.5.1 [22] as the SMT-solver in CEGIS(T ) with SMT-based theory solver. The SAT solver could, in principle, be replaced with Z3 to solve benchmarks over a broader range of theories.

We present results for four different configurations of CEGIS:


We compare our results against the latest release of CVC4, version 1.5. As we are interested in running our benchmarks without any syntactic template, the first reason for choosing CVC4 [6] as our comparison point is the fact that it performs well when no such templates are provided. This is illustrated by the fact that it won the Conditional Linear Integer Arithmetic track of the SyGuS competition 2017 [4], one of two tracks where a syntactic template was not used. The other track without syntactic templates is the invariant generation track, in which CVC4 was close second to LoopInvGen [24]. A second reason for picking CVC4 is its overall good performance on all benchmarks, whereas LoopInvGen is a solver specialised to invariant generation.

We also give a row of results for a hypothetical 4-core implementation, as would be allowed in the SyGuS Competition, running 4 configurations in parallel: CEGIS(T )-FM, CEGIS(T )-SMT, CEGIS, and CEGIS-Inc. A link to the full experimental environment, including scripts to reproduce the results, all benchmarks and the tool, is provided in the footnote as an Open Virtual Appliance (OVA)<sup>1</sup>.

<sup>1</sup> www.cprover.org/synthesis.


**Table 1.** Experimental results – for every set of benchmarks, we give the number of benchmarks solved by each configuration within the timeout and the average time taken per solved benchmark

#### **5.4 Results**

The results are given in Table 1. In combination, our CEGIS combination (i.e., CEGIS multi-core) solves 27 more benchmarks than CVC4, but the average time per benchmark is significantly higher.

As expected, both CEGIS(T )-SMT and CEGIS(T )-FM solve more of the invariant generation benchmarks which require synthesizing arbitrary constants than CVC4. Conversely, CVC4 performs better on benchmarks that require synthesizing long programs with many comparison operations, e.g., finding the maximum value in a series of numbers. CVC4 solves more of the hackers-delight and cryptographic circuit benchmarks, none of which require constants.

Our implementation of basic CEGIS (and consequently of all configurations built on top of this) only increases the length of the synthesized program when no program of a shorter length exists. Thus, it is expensive to synthesize longer programs. However, a benefit of this architecture is that the programs we synthesize are the minimum possible length. Many of the expressions synthesized by CVC4 are very large. This has been noted previously in the Syntax-Guided Synthesis Competition [5], and synthesizing without the syntactic template causes the expressions synthesized to be even longer.

Although CEGIS-Inc is quicker per iteration of the CEGIS loop than basic CEGIS, the average time per benchmark is not significantly better because of the variation in times produced by CEGIS. We hypothesise that the use of incremental solving makes CEGIS-Inc more prone to getting stuck exploring "bad" areas of the solution space than basic CEGIS, and so it requires more iterations than basic CEGIS for some benchmarks. The incremental solving preserves clauses learnt from any conflicts in previous iterations, which means that each SAT solving iteration will begin from exactly the same state as the previous one. The basic implementation doesn't preserve these clauses and so is free to start exploring a

new part of the search space each iteration. These effects could be mitigated by running multiple incremental solving instances in parallel.

In order to validate the assumption that CVC4 works better without a template than with one where the non-trivial constants were removed (see Sect. 5.2), we also ran CVC4 on a subset of the benchmarks with a syntactic template comprising the full instruction set we give to CEGIS, plus the constants 0 and 1. Note for some benchmarks it is not possible to add a grammar because the SYGUS-IF language does not allow syntactic templates for benchmarks that use the loop invariant syntax. With a grammar, CVC4 solves fewer of the benchmarks, and takes longer per benchmark. The syntactic template is helpful only in cases where non-trivial constants are needed and the non-trivial constants are contained within the template.

We ran EUSolver on the benchmarks with the syntactic templates, but the bitvector support is incomplete and missing some key operations. As a result EUSolver was unable to solve any benchmarks in the set, and so we have not included the results in the table.

*Benefit of Literal Constants.* We have investigated how useful the constants in the problem specification are, and have tried a configuration that seeds all constants in the problem specification as hints into the synthesis engine. This proved helpful for basic CEGIS only but not for the CEGIS(T ) configurations. Our hypothesis is that the latter do not benefit from this because they already have good support for computing constants. We dropped this option in the results presented in this section.

#### **5.5 Threats to Validity**

*Benchmark Selection:* We report an assessment of our approach on a diverse selection of benchmarks. Nevertheless, the set of benchmarks is limited within the scope of this paper, and the performance may not generalise to other benchmarks.

*Comparison with State of the Art:* CVC4 has not, as far as we are aware, been used for synthesis of bitvector functions without syntactic templates, and so this unanticipated use case may not have been fully tested. We are unable to compare all results to other solvers from the SyGuS Competition because EUSolver and EUPhony do not support synthesizing bitvector programs without a syntactic template, EUSolver's support for bitvectors is incomplete even when used with a template, LoopInvGen and DryadSynth do not support bitvectors, and E3Solver tackles only Programming By Example benchmarks [5].

*Choice of Theories:* We evaluated the benefits of CEGIS(T ) in the context of two specific theory instances. While the improvements in our experiments are significant, it is uncertain whether this will generalise to other theories.

#### **6 Related Work**

The traditional view of program synthesis is that of synthesis from complete specifications [21]. Such specifications are often unavailable, difficult to write, or expensive to check against using automated verification techniques. This has led to the proposal of inductive synthesis and, more recently, of oracle-based inductive synthesis, in which the complete specification is not available and oracles are queried to choose programs [19].

A well-known application of CEGIS is program sketching [29,31], where the programmer uses a partial program, called a *sketch*, to describe the desired implementation strategy, and leaves the low-level details of the implementation to an automated synthesis procedure. Inspired by sketching, Syntax-Guided Program Synthesis (SyGuS) [2] requires the user to supplement the logical specification provided to the program synthesizer with a syntactic template that constrains the space of solutions. In contrast to SyGuS, our aim is to improve the efficiency of the exploration to the point that user guidance is no longer required.

Another very active area of program synthesis is denoted by component-based approaches [1,13–15,17,18,25]. Such approaches are concerned with assembling programs from a database of existing components and make use of various techniques, from counterexample-guided synthesis [17] to type-directed search with lightweight SMT-based deduction and partial evaluation [14] and Petri-nets [15]. The techniques developed in the current paper are applicable to any componentbased synthesis approach that relies on counterexample-guided inductive synthesis.

Heuristics for constant synthesis are presented in [11], where the solution language is parameterised, inducing a lattice of progressively more expressive languages. One of the parameters is word width, which allows synthesizing programs with constants that satisfy the specification for smaller word widths. Subsequently, heuristics extend the program (including the constants) to the required word width. As opposed to this work, CEGIS(T ) denotes a systematic approach that does not rely on ad-hoc heuristics.

Regarding the use of SMT solvers in program synthesis, they are frequently employed as oracles. By contrast, Reynolds et al. [27] present an efficient encoding able to solve program synthesis constraints directly within an SMT solver. Their approach relies on rephrasing the synthesis constraint as the problem of refuting a universally quantified formula, which can be solved using first-order quantifier instantiation. Conversely, in our approach we maintain a clear separation between the synthesizer and the theory solver, which communicate in a well-defined manner. In Sect. 5, we provide a comprehensive experimental comparison with the synthesizer described in [27].

#### **7 Conclusion**

We proposed CEGIS(T ), a new approach to program synthesis that combines the strengths of a counterexample-guided inductive synthesizer with those of a theory solver to provide a more efficient exploration of the solution space. We discussed two options for the theory solver, one based on FM variable elimination and one relying on an off-the-shelf SMT solver. Our experiments results showed that, although slower than CVC4, CEGIS(T ) can solve more benchmarks within a reasonable time that require synthesizing arbitrary constants, where CVC4 fails.

### **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

### **Synthesizing Reactive Systems from Hyperproperties**

Bernd Finkbeiner, Christopher Hahn, Philip Lukert , Marvin Stenger, and Leander Tentrup(B)

> Reactive Systems Group, Saarland University, Saarbr¨ucken, Germany {finkbeiner,hahn,lukert, stenger,tentrup}@react.uni-saarland.de

**Abstract.** We study the reactive synthesis problem for hyperproperties given as formulas of the temporal logic HyperLTL. Hyperproperties generalize trace properties, i.e., sets of traces, to *sets of sets* of traces. Typical examples are information-flow policies like noninterference, which stipulate that no sensitive data must leak into the public domain. Such properties cannot be expressed in standard linear or branching-time temporal logics like LTL, CTL, or CTL∗. We show that, while the synthesis problem is undecidable for full HyperLTL, it remains decidable for the ∃<sup>∗</sup>, <sup>∃</sup><sup>∗</sup>∀<sup>1</sup>, and the *linear* <sup>∀</sup><sup>∗</sup> fragments. Beyond these fragments, the synthesis problem immediately becomes undecidable. For universal HyperLTL, we present a semi-decision procedure that constructs implementations and counterexamples up to a given bound. We report encouraging experimental results obtained with a prototype implementation on example specifications with hyperproperties like symmetric responses, secrecy, and information-flow.

#### **1 Introduction**

*Hyperproperties* [5] generalize trace properties in that they not only check the correctness of *individual* computation traces in isolation, but relate *multiple* computation traces to each other. HyperLTL [4] is a logic for expressing temporal hyperproperties, by extending linear-time temporal logic (LTL) with *explicit* quantification over traces. HyperLTL has been used to specify a variety of information-flow and security properties. Examples include classical properties like non-interference and observational determinism, as well as quantitative information-flow properties, symmetries in hardware designs, and formally verified error correcting codes [12]. For example, observational determinism can be expressed as the HyperLTL formula ∀π∀π- . (I<sup>π</sup> = I<sup>π</sup>- ) → (O<sup>π</sup> = O<sup>π</sup>- ), stating that, for every pair of traces, if the observable inputs are the same, then the observable outputs must be same as well. While the satisfiability [9], model checking [4,12], and runtime verification [1,10] problem for HyperLTL has been studied, the *reactive synthesis* problem of HyperLTL is, so far, still open.

Supported by the European Research Council (ERC) Grant OSARES (No. 683300). c The Author(s) 2018

H. Chockler and G. Weissenbacher (Eds.): CAV 2018, LNCS 10981, pp. 289–306, 2018. https://doi.org/10.1007/978-3-319-96145-3\_16

In reactive synthesis, we automatically construct an implementation that is guaranteed to satisfy a given specification. A fundamental difference to verification is that there is no human programmer involved: in verification, the programmer would first produce an implementation, which is then verified against the specification. In synthesis, the implementation is directly constructed from the specification. Because there is no programmer, it is crucial that the specification contains *all* desired properties of the implementation: the synthesized implementation is guaranteed to satisfy the given specification, but nothing is guaranteed beyond that. The added expressive power of HyperLTL over LTL is very attractive for synthesis: with synthesis from hyperproperties, we can guarantee that the implementation does not only accomplish the desired functionality, but is also free of information leaks, is symmetric, is fault-tolerant with respect to transmission errors, etc.

More formally, the reactive synthesis problem asks for a *strategy*, that is a tree branching on environment inputs whose nodes are labeled by the system output. Collecting the inputs and outputs along a branch of the tree, we obtain a trace. If the set of traces collected from the branches of the strategy tree satisfies the specification, we say that the strategy *realizes* the specification. The specification is *realizable* iff there exists a strategy tree that realizes the specification. With LTL specifications, we get trees where the trace on each individual branch satisfies the LTL formula. With HyperLTL, we additionally get trees where the traces between different branches are in a specified relationship. This is dramatically more powerful.

Consider, for example, the well-studied *distributed* version of the reactive synthesis problem, where the system is split into a set of processes, that each only see a subset of the inputs. The distributed synthesis problem for LTL can be expressed as the standard (non-distributed) synthesis problem for HyperLTL, by adding for each process the requirement that the process output is *observationally deterministic* in the process input. HyperLTL synthesis thus subsumes distributed synthesis. The information-flow requirements realized by HyperLTL synthesis can, however, be much more sophisticated than the observational determinism needed for distributed synthesis. Consider, for example, the *dining cryptographers* problem [3]: three cryptographers Ca, Cb, and C<sup>c</sup> sit at a table in a restaurant having dinner and either one of cryptographers or, alternatively, the NSA must pay for their meal. Is there a protocol where each cryptographer can find out whether it was a cryptographer who paid or the NSA, but cannot find out which cryptographer paid the bill?

Synthesis from LTL formulas is known to be decidable in doubly exponential time. The fact that the distributed synthesis problem is undecidable [21] immediately eliminates the hope for a similar general result for HyperLTL. However, since LTL is obviously a fragment of HyperLTL, this immediately leads to the question whether the synthesis problem is still decidable for fragments of Hyper-LTL that are close to LTL but go beyond LTL: when exactly does the synthesis problem become undecidable? From a more practical point of view, the interesting question is whether semi-algorithms for distributed synthesis [7,14], which have been successful in constructing distributed systems from LTL specifications despite the undecidability of the general problem, can be extended to HyperLTL?

In this paper, we answer the first question by studying the <sup>∃</sup>∗, <sup>∃</sup>∗∀<sup>1</sup>, and the *linear* ∀<sup>∗</sup> fragment. We show that the synthesis problem for all three fragments is decidable, and the problem becomes undecidable as soon as we go beyond these fragments. In particular, the synthesis problem for the full ∀<sup>∗</sup> fragment, which includes observational determinism, is undecidable.

We answer the second question by studying the *bounded* version of the synthesis problem for the ∀<sup>∗</sup> fragment. In order to detect realizability, we ask whether, for a universal HyperLTL formula ϕ and a given bound n on the number of states, there exists a representation of the strategy tree as a finite-state machine with no more than n states that satisfies ϕ. To detect unrealizability, we check whether there exists a counterexample to realizability of bounded size. We show that both checks can be effectively reduced to SMT solving.

**Related Work.** HyperLTL [4] is a successor of the temporal logic SecLTL [6] used to characterize temporal information-flow. The model-checking [4,12], satisfiability [9], monitoring problem [1,10], and the first-order extension [17] of HyperLTL has been studied before. To the best of the authors knowledge, this is the first work that considers the synthesis problem for temporal hyperproperties. We base our algorithms on well-known synthesis algorithms such as bounded synthesis [14] that itself is an instance of Safraless synthesis [18] for ω-regular languages. Further techniques that we adapt for hyperproperties are lazy synthesis [11] and the bounded unrealizability method [15,16].

Hyperproperties [5] can be seen as a unifying framework for many different properties of interest in multiple distinct areas of research. Information-flow properties in security and privacy research are hyperproperties [4]. HyperLTL subsumes logics that reason over knowledge [4]. Information-flow in distributed systems is another example of hyperproperties, and the HyperLTL realizability problem subsumes both the distributed synthesis problem [13,21] as well as synthesis of fault-tolerant systems [16]. In circuit verification, the semantic independence of circuit output signals on a certain set of inputs, enabling a range of potential optimizations, is a hyperproperty.

#### **2 Preliminaries**

*HyperLTL.* HyperLTL [4] is a temporal logic for specifying hyperproperties. It extends LTL by quantification over trace variables π and a method to link atomic propositions to specific traces. The set of trace variables is V. Formulas in HyperLTL are given by the grammar

ϕ ::= ∀π.ϕ | ∃π.ϕ | ψ , and ψ ::= a<sup>π</sup> | ¬ψ | ψ ∨ ψ | ψ | ψ U ψ ,

where <sup>a</sup> <sup>∈</sup> AP and <sup>π</sup> ∈ V. The alphabet of a HyperLTL formula is 2*AP* . We allow the standard boolean connectives ∧, →, ↔ as well as the derived LTL operators release ϕ R ψ ≡ ¬(¬ϕ U ¬ψ), eventually ϕ ≡ *true* U ϕ, globally ϕ ≡ ¬ ¬ϕ, and weak until ϕ W ψ ≡ ϕ ∨ (ϕ U ψ).

The semantics is given by the satisfaction relation -<sup>T</sup> over a set of traces <sup>T</sup> <sup>⊆</sup> (2AP)ω. We define an assignment <sup>Π</sup> : V → (2AP)<sup>ω</sup> that maps trace variables to traces. Π[i,∞] is the trace assignment that is equal to Π(π)[i,∞] for all π and denotes the assignment where the first i items are removed from each trace.

Π -<sup>T</sup> a<sup>π</sup> if a ∈ Π(π)[0] Π -<sup>T</sup> <sup>¬</sup><sup>ϕ</sup> if <sup>Π</sup> -<sup>T</sup> ϕ Π -<sup>T</sup> ϕ ∨ ψ if Π -<sup>T</sup> ϕ or Π -<sup>T</sup> ψ Π -<sup>T</sup> ϕ if Π[1,∞] -<sup>T</sup> ϕ Π -<sup>T</sup> ϕ U ψ if ∃i ≥ 0. Π[i,∞] -<sup>T</sup> ψ ∧ ∀0 ≤ j < i. Π[j,∞] -<sup>T</sup> ϕ Π -<sup>T</sup> ∃π.ϕ if there is some t ∈ T such that Π[π → t] -<sup>T</sup> ϕ Π -<sup>T</sup> ∀π.ϕ if for all t ∈ T holds that Π[π → t] -<sup>T</sup> ϕ

We write T ϕ for {} -<sup>T</sup> ϕ where {} denotes the empty assignment. Two HyperLTL formulas ϕ and ψ are equivalent, written ϕ ≡ ψ if they have the same models.

*(In)dependence* is a common hyperproperty for which we define the following syntactic sugar. Given two disjoint subsets of atomic propositions C ⊆ AP and A ⊆ AP, we define independence as the following HyperLTL formula

$$D\_{A \to C} \coloneqq \forall \pi \forall \pi'. \left(\bigvee\_{a \in A} (a\_{\pi} \leftrightarrow a\_{\pi'})\right) \mathcal{R}\left(\bigwedge\_{c \in C} (c\_{\pi} \leftrightarrow c\_{\pi'})\right) \,. \tag{1}$$

This guarantees that every proposition c ∈ C solely depends on propositions A.

*Strategies.* <sup>A</sup> *strategy* <sup>f</sup> : (2<sup>I</sup> )<sup>∗</sup> <sup>→</sup> <sup>2</sup><sup>O</sup> maps sequences of input valuations 2<sup>I</sup> to an output valuation 2<sup>O</sup>. The behavior of a strategy <sup>f</sup> : (2<sup>I</sup> )<sup>∗</sup> <sup>→</sup> <sup>2</sup><sup>O</sup> is characterized by an infinite tree that branches by the valuations of I and whose nodes <sup>w</sup> <sup>∈</sup> (2<sup>I</sup> )<sup>∗</sup> are labeled with the strategic choice <sup>f</sup>(w). For an infinite word <sup>w</sup> <sup>=</sup> <sup>w</sup>0w1w<sup>2</sup> ··· ∈ (2<sup>I</sup> )<sup>ω</sup>, the corresponding labeled path is defined as (f()∪w0)(f(w0)∪w1)(f(w0w1)∪w2)···∈ (2<sup>I</sup>∪<sup>O</sup>)<sup>ω</sup>. We lift the set containment operator <sup>∈</sup> to the containment of a labeled path <sup>w</sup> <sup>=</sup> <sup>w</sup>0w1w<sup>2</sup> ···∈ (2<sup>I</sup>∪<sup>O</sup>)<sup>ω</sup> in a strategy tree induced by <sup>f</sup> : (2<sup>I</sup> )<sup>∗</sup> <sup>→</sup> <sup>2</sup><sup>O</sup>, i.e., <sup>w</sup> <sup>∈</sup> <sup>f</sup> if, and only if, <sup>f</sup>() = <sup>w</sup><sup>0</sup> <sup>∩</sup><sup>O</sup> and f((w<sup>0</sup> ∩I)···(w<sup>i</sup> ∩I)) = w<sup>i</sup>+1 ∩O for all i ≥ 0. We define the satisfaction of a HyperLTL formula ϕ (over propositions I ∪ O) on strategy f, written f ϕ, as {w | w ∈ f} ϕ. Thus, a strategy f is a model of ϕ if the set of labeled paths of f is a model of ϕ.

#### **3 HyperLTL Synthesis**

In this section, we identify fragments of HyperLTL for which the realizability problem is decidable. Our findings are summarized in Table 1.

**Definition 1 (HyperLTL Realizability).** *A HyperLTL formula* ϕ *over atomic propositions AP* <sup>=</sup> <sup>I</sup> <sup>∪</sup>˙ <sup>O</sup> *is realizable if there is a strategy* <sup>f</sup> : (2<sup>I</sup> )<sup>∗</sup> <sup>→</sup> <sup>2</sup><sup>O</sup> *that satisfies* ϕ*.*



We base our investigation on the structure of the quantifier prefix of the Hyper-LTL formulas. We call a HyperLTL formula ϕ (quantifier) *alternation-free* if the quantifier prefix consists solely of either universal or existential quantifiers. We denote the corresponding fragments as the (universal) ∀<sup>∗</sup> and the (existential) ∃<sup>∗</sup> fragment, respectively. A HyperLTL formula is in the ∃<sup>∗</sup>∀<sup>∗</sup> fragment, if it starts with arbitrarily many existential quantifiers, followed by arbitrarily many universal quantifiers. Respectively for the ∀<sup>∗</sup>∃<sup>∗</sup> fragment. For a given natural number <sup>n</sup>, we refer to a bounded number of quantifiers with <sup>∀</sup><sup>n</sup>, respectively <sup>∃</sup><sup>n</sup>. The <sup>∀</sup><sup>1</sup> realizability problem is equivalent to the LTL realizability problem.

*∃<sup>∗</sup>* **Fragment.** We show that the realizability problem for existential HyperLTL is PSpace-complete. We reduce the realizability problem to the satisfiability problem for bounded one-alternating <sup>∃</sup><sup>∗</sup>∀<sup>2</sup>HyperLTL [9], i.e., finding a trace set T such that T ϕ.

**Lemma 1.** *An existential HyperLTL formula* ϕ *is realizable if, and only if,* ψ := ϕ ∧ D<sup>I</sup>→<sup>O</sup> *is satisfiable.*

*Proof.* Assume <sup>f</sup> : (2<sup>I</sup> )<sup>∗</sup> <sup>→</sup> <sup>2</sup><sup>O</sup> realizes <sup>ϕ</sup>, that is <sup>f</sup> ϕ. Let T = {w | w ∈ f} be the set of traces generated by f. It holds that T ϕ and T - D<sup>I</sup>→<sup>O</sup>. Therefore, ψ is satisfiable. Assume ψ is satisfiable. Let S be a set of traces that satisfies ψ. We construct a strategy <sup>f</sup> : (2<sup>I</sup> )<sup>∗</sup> <sup>→</sup> <sup>2</sup><sup>O</sup> as

$$f(\sigma) = \begin{cases} w\_{|\sigma|} \cap O & \text{if } \sigma \text{ is a prefix of some } w |\_I \text{ with } w \in S \text{ } , \text{ and} \\ \emptyset & \text{otherwise } \text{ .} \end{cases}$$

where w|<sup>I</sup> denotes the trace restricted to I, formally w<sup>i</sup> ∩ I for all i ≥ 0. Note that if there are multiple candidates w ∈ S, then w|σ<sup>|</sup> ∩ O is the same for all of them because of the required non-determinism D<sup>I</sup>→<sup>O</sup>. By construction, all traces in S are contained in f and with S ϕ it holds that f ϕ as ϕ is an existential formula.

**Theorem 1.** *Realizability of existential HyperLTL specifications is decidable.*

*Proof.* The formula <sup>ψ</sup> from Lemma <sup>1</sup> is in the <sup>∃</sup><sup>∗</sup>∀<sup>2</sup> fragment, for which satisfiability is decidable [9].

**Corollary 1.** *Realizability of* ∃<sup>∗</sup>*HyperLTL specifications is* PSpace*-complete.*

*Proof.* Given an existential HyperLTL formula, we gave a linear reduction to the satisfiability of the <sup>∃</sup><sup>∗</sup>∀<sup>2</sup> fragment in Lemma 1. The satisfiability problem for a bounded number of universal quantifiers is in PSpace [9]. Hardness follows from LTL satisfiability, which is equivalent to the <sup>∃</sup><sup>1</sup> fragment.

(a) An architecture of two processes that specify process *<sup>p</sup>*<sup>1</sup> to produce *<sup>c</sup>* from *<sup>a</sup>* and *<sup>p</sup>*<sup>2</sup> to produce *<sup>d</sup>* from *<sup>b</sup>*.

(b) The same architecture as on the left, where only the inputs of process *<sup>p</sup>*<sup>2</sup> are changed to *<sup>a</sup>* and *<sup>b</sup>*.

#### **Fig. 1.** Distributed architectures

*∀*<sup>∗</sup> **Fragment.** In the following, we will use the *distributed synthesis* problem, i.e., the problem whether there is an implementation of processes in a distributed *architecture* that satisfies an LTL formula. Formally, a distributed architecture A is a tuple P, penv, I, O where P is a finite set of processes with distinguished environment process <sup>p</sup>*env* <sup>∈</sup> <sup>P</sup>. The functions <sup>I</sup> : <sup>P</sup> <sup>→</sup> <sup>2</sup>AP and <sup>O</sup>: <sup>P</sup> <sup>→</sup> <sup>2</sup>AP define the inputs and outputs of processes. While processes may share the same inputs (in case of broadcasting), the outputs of processes must be pairwise disjoint, i.e., for all p = p- ∈ P it holds that O(p) ∩ O(p- ) = ∅. W.l.o.g. we assume that I(p*env*) = ∅. The distributed synthesis problem for architectures without *information forks* [13] is decidable. Example architectures are depicted in Fig. 1. The architecture in Fig. 1a contains an information fork while the architecture in Fig. 1b does not. Furthermore, the processes in Fig. 1b can be ordered linearly according to the subset relation on the inputs.

#### **Theorem 2.** *The synthesis problem for universal HyperLTL is undecidable.*

*Proof.* In the ∀<sup>∗</sup> fragment (and thus in the ∃<sup>∗</sup>∀<sup>∗</sup> fragment), we can encode a distributed architecture [13], for which LTL synthesis is undecidable. In particular, we can encode the architecture shown in Fig. 1a. This architecture basically specifies c to depend only on a and analogously d on b. That can be encoded by D{a}→{c} and D{b}→{d}. The LTL synthesis problem for this architecture is already shown to be undecidable [13], i.e., given an LTL formula over I = {a, b} and O = {c, d}, we cannot automatically construct processes p<sup>1</sup> and p<sup>2</sup> that realize the formula.

**Linear** *∀*<sup>∗</sup> **Fragment.** For characterizing the linear fragment of HyperLTL, we will present a transformation from a formula with arbitrarily many universal quantifiers to a formula with only one quantifier. This transformation collapses the universal quantifier into a single one and renames the path variables accordingly. For example, <sup>∀</sup>π1∀π2. <sup>a</sup><sup>π</sup><sup>1</sup> <sup>∨</sup> <sup>a</sup><sup>π</sup><sup>2</sup> is transformed into an equivalent <sup>∀</sup><sup>1</sup> formula ∀π. a<sup>π</sup> ∨ aπ. However, this transformation does not always produce equivalent formulas as ∀π1∀π2. (a<sup>π</sup><sup>1</sup> ↔ a<sup>π</sup><sup>2</sup> ) is not equivalent to its collapsed form ∀π. (a<sup>π</sup> ↔ aπ). Let ϕ be ∀π<sup>1</sup> ···∀πn. ψ. We define the collapsed formula of ϕ as *collapse*(ϕ) := ∀π.ψ[π<sup>1</sup> → π][π<sup>2</sup> → π] ... [π<sup>n</sup> → π] where ψ[π<sup>i</sup> → π] replaces all occurrences of π<sup>i</sup> in ψ with π. Although the collapsed term is not always equivalent to the original formula, we can use it as an indicator whether it is possible at all to express a universal formula with only one quantifier as stated in the following lemma.

### **Lemma 2.** *Either* <sup>ϕ</sup> <sup>≡</sup> *collapse*(ϕ) *or* <sup>ϕ</sup> *has no equivalent* <sup>∀</sup><sup>1</sup> *formula.*

*Proof.* Suppose there is some <sup>ψ</sup> ∈ ∀<sup>1</sup> with <sup>ψ</sup> <sup>≡</sup> <sup>ϕ</sup>. We show that <sup>ψ</sup> <sup>≡</sup> *collapse*(ϕ). Let <sup>T</sup> be an arbitrary set of traces. Let <sup>T</sup> <sup>=</sup> {{w} | <sup>w</sup> <sup>∈</sup> <sup>T</sup>}. Because <sup>ψ</sup> ∈ ∀<sup>1</sup>, T ψ is equivalent to ∀T- ∈ T . T- ψ, which is by assumption equivalent to ∀T- ∈ T . T- ϕ. Now, ϕ operates on singleton trace sets only. This means that all quantified paths have to be the same, which yields that we can use the same path variable for all of them. So ∀T- ∈ T . T- ϕ ↔ T- *collapse*(ϕ) that is again equivalent to T *collapse*(ϕ). Because ψ ≡ *collapse*(ϕ) and ψ ≡ ϕ it holds that ϕ ≡ *collapse*(ϕ).

The LTL realizability problem for distributed architectures without information forks [13] are decidable. These architectures are in some way *linear*, i.e., the processes can be ordered such that lower processes always have a subset of the information of upper processes. The linear fragment of universal HyperLTL addresses exactly these architectures.

In the following, we sketch the characterization of the linear fragment of HyperLTL. Given a formula ϕ, we seek for variable dependencies of the form D<sup>J</sup>→{o} with J ⊆ I and o ∈ O in the formula. If the part of the formula ϕ that relates multiple paths consists only of such constraints D<sup>J</sup>→{o} with the rest being an LTL property, we can interpret ϕ as a description of a distributed architecture. If furthermore, the D<sup>J</sup>*i*→{o*i*} constraints can be ordered such that J<sup>i</sup> ⊆ J<sup>i</sup>+1 for all i, the architecture is linear. There are three steps to check whether ϕ is in the linear fragment:


**Definition 2 (linear fragment of** ∀<sup>∗</sup>**).** *A formula* ϕ *is in the linear fragment of* ∀<sup>∗</sup> *iff for all* o<sup>i</sup> ∈ O *there is a* J<sup>i</sup> ⊆ I *such that* ϕ ∧ D<sup>I</sup>→<sup>O</sup> ≡ *collapse*(ϕ) ∧ <sup>o</sup>*i*∈<sup>O</sup> <sup>D</sup><sup>J</sup>*i*→{o*i*} *and* <sup>J</sup><sup>i</sup> <sup>⊆</sup> <sup>J</sup><sup>i</sup>+1 *for all* <sup>i</sup>*.*

Note, that each <sup>∀</sup><sup>1</sup> formula <sup>ϕ</sup> (or <sup>ϕ</sup> is collapsible to a <sup>∀</sup><sup>1</sup> formula) is in the linear fragment because we can set all J<sup>i</sup> = I and additionally *collapse*(ϕ) = ϕ holds.

As an example of a formula in the linear fragment of ∀∗, consider ϕ = ∀π, π- . D{a}→{c} ∧ (c<sup>π</sup> ↔ dπ)∧ (b<sup>π</sup> ↔ eπ) with I = {a, b} and O = {c, d, e}. The corresponding formula asserting input-deterministism is ϕdet = ϕ ∧ DI→O. One possible choice of J's is {a, b} for c, {a} for d and {a, b} for e. Note, that one can use either {a, b} or {a} for c as D{a}→{d} ∧(c<sup>π</sup> ↔ dπ) implies D{a}→{c}. However, the apparent alternative {b} for e would yield an undecidable architecture. It holds that ϕdet and *collapse*(ϕ)∧ D{a,b}→{c} ∧ D{a}→{d} ∧ D{a,b}→{e} are equivalent and, thus, that ϕ is in the linear fragment.

#### **Theorem 3.** *The linear fragment of universal HyperLTL is decidable.*

*Proof.* It holds that ϕ ≡ *collapse*(ϕ) ∧ <sup>o</sup>*i*∈<sup>O</sup> <sup>D</sup><sup>J</sup>*i*→{o*i*} for some <sup>J</sup>i's. The LTL distributed realizability problem for *collapse*(ϕ) in the constructed architecture A is equivalent to the HyperLTL realizability of ϕ as the architecture A represents exactly the input-determinism represented by formula <sup>o</sup>*i*∈<sup>O</sup> <sup>D</sup><sup>J</sup>*i*→{o*i*}. The architecture is linear and, thus, the realizability problem is decidable.

*∃<sup>∗</sup>∀***<sup>1</sup> Fragment.** In this fragment, we consider arbitrary many existential path quantifier followed by a single universal path quantifier. This fragment turns out to be still decidable. We solve the realizability problem for this fragment by reducing it to a decidable fragment of the distributed realizability problem.

### **Theorem 4.** *Realizability of* <sup>∃</sup><sup>∗</sup>∀<sup>1</sup>*HyperLTL specifications is decidable.*

*Proof.* Let ϕ be ∃π<sup>1</sup> ... ∃πn∀π- . ψ. We reduce the realizability problem of ϕ to the distributed realizability problem for LTL. For every existential path quantifier πi, we introduce a copy of the atomic propositions, written a<sup>π</sup>*<sup>i</sup>* for a ∈ AP. Intuitively, those select the paths in the strategy tree where the existential path quantifiers are evaluated. Thus, those propositions (1) have to encode an actual path in the strategy tree and (2) may not depend on the branching of the strategy tree. To ensure (1), we add the LTL constraint (I<sup>π</sup>*<sup>i</sup>* = I<sup>π</sup>- ) → (O<sup>π</sup>*<sup>i</sup>* = O<sup>π</sup>- ) that asserts that if the inputs correspond to some path in the strategy tree, the outputs on those paths have to be the same. Property (2) is guaranteed by the distributed architecture, the processes generating the propositions a<sup>π</sup>*<sup>i</sup>* do not depend on the environment output. The resulting architecture A<sup>ϕ</sup> is {p*env*, p, p- }, p*env*, {p → ∅, p- → I<sup>π</sup>-}, {p*env* → I<sup>π</sup>- , p → <sup>1</sup>≤i≤<sup>n</sup> <sup>O</sup><sup>π</sup>*<sup>i</sup>* <sup>∪</sup> <sup>I</sup><sup>π</sup>*<sup>i</sup>* , p- → O<sup>π</sup>-}. It is easy to verify that A<sup>ϕ</sup> does not contain an information fork, thus the realizability problem is decidable. The LTL specification θ is ψ∧ <sup>1</sup>≤i≤<sup>n</sup> (I<sup>π</sup>*<sup>i</sup>* <sup>=</sup> I<sup>π</sup>- ) → (O<sup>π</sup>*<sup>i</sup>* = O<sup>π</sup>- ). The implementation of process p- (if it exists) is a model for the HyperLTL formula (process p producing witness for the ∃ quantifier). Conversely, a model for ϕ can be used as an implementation of p- . Thus, the distributed synthesis problem Aϕ, θ has a solution if, and only if, ϕ is realizable.

*∀<sup>∗</sup>∃<sup>∗</sup>* **Fragment**. The last fragment to consider are formulas in the ∀<sup>∗</sup>∃<sup>∗</sup> fragment. Whereas the <sup>∃</sup><sup>∗</sup>∀<sup>1</sup> fragment remains decidable, the realizability problem of ∀<sup>∗</sup>∃<sup>∗</sup> turns out to be undecidable even when restricted to only one quantifier of both sorts (∀<sup>1</sup>∃<sup>1</sup>).

**Theorem 5.** *Realizability of* ∀∗∃∗*HyperLTL is undecidable.*

*Proof.* The proof is done via reduction from Post's Correspondence Problem (PCP) [22]. The basic idea follows the proof in [9].

#### **4 Bounded Realizability**

We propose an algorithm to synthesize strategies from specifications given in universal HyperLTL by searching for finite generators of realizing strategies. We encode this search as a satisfiability problem for a decidable constraint system.

*Transition Systems.* A *transition system* S is a tuple S, s0,τ,l where S is a finite set of states, <sup>s</sup><sup>0</sup> <sup>∈</sup> <sup>S</sup> is the designated initial state, <sup>τ</sup> : <sup>S</sup> <sup>×</sup> <sup>2</sup><sup>I</sup> <sup>→</sup> <sup>S</sup> is the transition function, and <sup>l</sup>: <sup>S</sup> <sup>→</sup> <sup>2</sup><sup>O</sup> is the state-labeling or output function. We generalize the transition function to sequences over 2<sup>I</sup> by defining <sup>τ</sup> <sup>∗</sup> : (2<sup>I</sup> )<sup>∗</sup> <sup>→</sup> S recursively as τ <sup>∗</sup>() = s<sup>0</sup> and τ <sup>∗</sup>(w<sup>0</sup> ··· w<sup>n</sup>−1wn) = τ (τ <sup>∗</sup>(w<sup>0</sup> ··· w<sup>n</sup>−<sup>1</sup>), wn) for <sup>w</sup><sup>0</sup> ··· <sup>w</sup><sup>n</sup>−1w<sup>n</sup> <sup>∈</sup> (2<sup>I</sup> )<sup>+</sup>. A transition system <sup>S</sup> *generates* the strategy <sup>f</sup> if <sup>f</sup>(w) = <sup>l</sup>(<sup>τ</sup> <sup>∗</sup>(w)) for every <sup>w</sup> <sup>∈</sup> (2<sup>I</sup> )∗. A strategy <sup>f</sup> is called *finite-state* if there exists a transition system that generates f.

*Overview.* We first sketch the synthesis procedure and then proceed with a description of the intermediate steps. Let ϕ be a universal HyperLTL formula ∀π<sup>1</sup> ···∀πn. ψ. We build the automaton A<sup>ψ</sup> whose language is the set of tuples of traces that satisfy ψ. We then define the acceptance of a transition system S on A<sup>ψ</sup> by means of the self-composition of S. Lastly, we encode the existence of a transition system accepted by A<sup>ψ</sup> as an SMT constraint system.

*Example 1.* Throughout this section, we will use the following (simplified) running example. Assume we want to synthesize a system that keeps decisions secret until it is allowed to publish. Thus, our system has three input signals *decision*, indicating whether a decision was made, the secret *value*, and a signal to *publish* results. Furthermore, our system has two outputs, a *high* output *internal* that stores the value of the last decision, and a *low* output *result* that indicates the result. No information about decisions should be inferred until publication. To specify the functionality, we propose the LTL specification

$$
\begin{aligned}
\mathsf{\sf D}(decision \to (value \leftrightarrow \mathsf{\sf O} \, internal)) \\
\wedge \, \mathsf{\sf D}(\neg decision \to (internal \leftrightarrow \mathsf{\sf O} \, internal)) \\
\wedge \, \mathsf{\sf D}(publish \to \mathsf{\sf O} (internal \leftrightarrow \, result)) \, \, .
\end{aligned}
\tag{2}
$$

The solution produced by the LTL synthesis tool BoSy [8], shown in Fig. 2, clearly violates our intention that results should be secret until publish: Whenever a decision is made, the output *result* changes as well.

We formalize the property that no information about the decision can be inferred from *result* until publication as the HyperLTL formula

$$\forall \pi \forall \pi'. \left(pushh\_{\pi} \lor publich\_{\pi'} \right) \mathcal{R} \left(result\_{\pi} \leftrightarrow result\_{\pi'} \right) \ . \tag{3}$$

**Fig. 2.** Synthesized solutions for Example 1.

It asserts that for every pair of traces, the *result* signals have to be the same until (if ever) there is a *publish* signal on either trace. A solution satisfying both, the functional specification and the hyperproperty, is shown in Fig. 2. The system switches states whenever there is a decision with a different value than before and only exposes the decision in case there is a prior publish command.

We proceed with introducing the necessary preliminaries for our algorithm.

*Automata.* A universal co-B¨uchi automaton A over a finite alphabet Σ is a tuple Q, q0, δ, F, where Q is a finite set of states, q<sup>0</sup> ∈ Q is the designated initial state, <sup>δ</sup> : <sup>Q</sup>×2<sup>Σ</sup> <sup>×</sup><sup>Q</sup> is the transition relation, and <sup>F</sup> <sup>⊆</sup> <sup>Q</sup> is the set of rejecting states. Given an infinite word <sup>σ</sup> <sup>=</sup> <sup>σ</sup>0σ1σ<sup>2</sup> ··· ∈ (2<sup>Σ</sup>)<sup>ω</sup>, a run of <sup>σ</sup> on <sup>A</sup> is an infinite path <sup>q</sup>0q1q<sup>2</sup> ··· ∈ <sup>Q</sup><sup>ω</sup> where for all <sup>i</sup> <sup>≥</sup> 0 it holds that (qi, σi, q<sup>i</sup>+1) <sup>∈</sup> <sup>δ</sup>. A run is accepting, if it contains only finitely many rejecting states. A accepts a word σ, if *all* runs of σ on A are accepting. The language of A, written L(A), is the set {<sup>σ</sup> <sup>∈</sup> (2<sup>Σ</sup>)<sup>ω</sup> | A accepts <sup>σ</sup>}. We represent automata as directed graphs with vertex set Q and a symbolic representation of the transition relation δ as propositional boolean formulas B(Σ). The rejecting states in F are marked by double lines. The automata for the LTL and HyperLTL specifications from Example 1 are depicted in Fig. 3.

*Run Graph.* The run graph of a transition system S = S, s0,τ,l on a universal co-B¨uchi automaton A = Q, q0, δ, F is a directed graph V,E where V = S×Q is the set of vertices and E ⊆ V × V is the edge relation with

$$\begin{aligned} ((s, q), (s', q')) \in E \text{ iff } \\ \exists i \in 2^I. \exists o \in 2^O. (\tau(s, i) = s') \land (l(s) = o) \land (q, i \cup o, q') \in \delta \text{ } \end{aligned}$$

A run graph is accepting if every path (starting at the initial vertex (s0, q0)) has only finitely many visits of rejecting states. To show acceptance, we annotate every reachable node in the run graph with a natural number m, such that any path, starting in the initial state, contains less than m visits of rejecting states. Such an annotation exists if, and only if, the run graph is accepting [14].

(a) Automaton accepting language defined by LTL formula in (2)

(b) Automaton accepting language defined

by HyperLTL formula in (3)

**Fig. 3.** Universal co-B¨uchi automata recognizing the languages from Example 1.

*Self-composition.* The model checking of universal HyperLTL formulas [12] is based on self-composition. Let *prj* <sup>i</sup> be the projection to the i-th element of a tuple. Let *zip* denote the usual function that maps a n-tuple of sequences to a single sequence of n-tuples, for example, zip([1, 2, 3], [4, 5, 6]) = [(1, 4),(2, 5),(3, 6)], and let *unzip* denote its inverse. The transition system <sup>S</sup><sup>n</sup> is the <sup>n</sup>-fold selfcomposition of <sup>S</sup> <sup>=</sup> S, s0,τ,l, if <sup>S</sup><sup>n</sup> <sup>=</sup> S<sup>n</sup>, s<sup>n</sup> <sup>0</sup> , τ - , l<sup>n</sup> and for all s, s- <sup>∈</sup> <sup>S</sup><sup>n</sup>, <sup>α</sup> <sup>∈</sup> (2<sup>I</sup> )<sup>n</sup>, and <sup>β</sup> <sup>∈</sup> (2<sup>O</sup>)<sup>n</sup> we have that <sup>τ</sup> - (s, α) = s and l <sup>n</sup>(s) = β iff for all 1 ≤ i ≤ n, it hold that τ (*prj* <sup>i</sup>(s), *prj* <sup>i</sup>(α)) = *prj* <sup>i</sup>(s- ) and l(*prj* <sup>i</sup>(s)) = *prj* <sup>i</sup>(β). If T is the set of traces generated by S, then {*zip*(t1,...,tn) | t1,...,t<sup>n</sup> ∈ T} is the set of traces generated by <sup>S</sup><sup>n</sup>.

We construct the universal co-B¨uchi automaton A<sup>ψ</sup> such that the language of A<sup>ψ</sup> is the set of words w such that *unzip*(w) = Π and Π -<sup>∅</sup> ψ, i.e., the tuple of traces that satisfy ψ. We get this automaton by dualizing the non-deterministic B¨uchi automaton for ¬ψ [4], i.e., changing the branching from non-deterministic to universal and the acceptance condition from B¨uchi to co-B¨uchi. Hence, S satisfies a universal HyperLTL formula ϕ = ∀π<sup>1</sup> ... ∀πk. ψ if the traces generated by self-composition <sup>S</sup><sup>n</sup> are a subset of <sup>L</sup>(Aψ).

**Lemma 3.** *A transition system* S *satisfies the universal HyperLTL formula* ϕ = <sup>∀</sup>π<sup>1</sup> ···∀πn. ψ*, if the run graph of* <sup>S</sup><sup>n</sup> *and* <sup>A</sup><sup>ψ</sup> *is accepting.*

*Synthesis.* Let S = S, s0,τ,l and A<sup>ψ</sup> = Q, q0, δ, F. We encode the synthesis problem as an SMT constraint system. Therefore, we use uninterpreted function symbols to encode the transition system and the annotation. For the transition system, those functions are the transition function <sup>τ</sup> : <sup>S</sup> <sup>×</sup> <sup>2</sup><sup>I</sup> <sup>→</sup> <sup>S</sup> and the labeling function <sup>l</sup> : <sup>S</sup> <sup>→</sup> <sup>2</sup><sup>O</sup>. The annotation is split into two parts, a reachability constraint <sup>λ</sup><sup>B</sup> : <sup>S</sup><sup>n</sup> <sup>×</sup> <sup>Q</sup> <sup>→</sup> <sup>B</sup> indicating whether a state in the run graph is reachable and a counter <sup>λ</sup># : <sup>S</sup><sup>n</sup> <sup>×</sup> <sup>Q</sup> <sup>→</sup> <sup>N</sup> that maps every reachable vertex to the maximal number of rejecting states visited by any path starting in the initial vertex. The resulting constraint asserts that there is a transition system with accepting run graph.

$$\forall s, s' \in S^n. \forall q, q' \in Q. \forall i \in \left(2^I\right)^n.$$

$$\left(\lambda^{\mathbb{B}}(s, q) \land \tau'(s, i) = s' \land \left(q, i \cup l(s), q'\right) \in \delta\right) \to \lambda^{\mathbb{B}}(s', q') \land \lambda^{\#}(s', q') \succeq \lambda^{\#}(s, q).$$

where is > if q-∈ F and ≥ otherwise.

**Theorem 6.** *The constraint system is satisfiable with bound* b *if, and only if, there is a transition system* S *of size* b *that realizes the HyperLTL formula.*

We extract a realizing implementation by asking the satisfiability solver to generate a model for the uninterpreted functions that encode the transition system.

#### **5 Bounded Unrealizability**

So far, we focused on the positive case, providing an algorithm for finding small solutions, if they exist. In this section, we shift to the case of detecting if a universal HyperLTL formula is unrealizable. We adapt the definition of counterexamples to realizability for LTL [15] to HyperLTL in the following. Let ϕ be a universal HyperLTL formula ∀π<sup>1</sup> ···∀πn. ψ over inputs I and outputs O, <sup>a</sup> *counterexample to realizability* is a set of input traces P ⊆ (2<sup>I</sup> )<sup>ω</sup> such that for every strategy <sup>f</sup> : (2<sup>I</sup> )<sup>∗</sup> <sup>→</sup> <sup>2</sup><sup>O</sup> the labeled traces <sup>P</sup><sup>f</sup> <sup>⊆</sup> (2<sup>I</sup>∪<sup>O</sup>)<sup>ω</sup> satisfy ¬ϕ = ∃π<sup>1</sup> ···∃πn.¬ψ.

**Proposition 1.** *A universal HyperLTL formula* ϕ = ∀π<sup>1</sup> ···∀πn. ψ *is unrealizable if there is a counterexample* P *to realizability.*

*Proof.* For contradiction, we assume ϕ is realizable by a strategy f. As P is a counterexample to realizability, we know <sup>P</sup><sup>f</sup> - ∃π<sup>1</sup> ···∃πn.¬ψ. This means that there exists an assignment <sup>Π</sup><sup>P</sup> ∈V→P<sup>f</sup> with <sup>Π</sup><sup>P</sup> -<sup>P</sup>*<sup>f</sup>* ¬ψ. Equivalently <sup>Π</sup><sup>P</sup> -<sup>P</sup>*<sup>f</sup>* <sup>ψ</sup>. Therefore, not all assignments <sup>Π</sup> ∈V→P<sup>f</sup> satisfy <sup>Π</sup> -<sup>P</sup>*<sup>f</sup>* ψ. Which implies <sup>P</sup><sup>f</sup> - <sup>∀</sup>π<sup>1</sup> ···∀πn. ψ <sup>=</sup> <sup>ϕ</sup>. Since <sup>ϕ</sup> is universal, we can defer <sup>f</sup> ϕ, which concludes the contradiction. Thus, ϕ is unrealizable.

Despite being independent of strategy trees, there are in many cases finite representations of P. Consider, for example, the unrealizable specification ϕ<sup>1</sup> = ∀π∀π- . (i<sup>π</sup> ↔ i<sup>π</sup>- ), where the set <sup>P</sup><sup>1</sup> <sup>=</sup> {∅<sup>ω</sup>, {i}<sup>ω</sup>} is a counterexample to realizability. As a second example, consider ϕ<sup>2</sup> = ∀π∀π- . (o<sup>π</sup> ↔ o<sup>π</sup>- ) ∧ (i<sup>π</sup> ↔ oπ) with conflicting requirements on o. P<sup>1</sup> is a counterexample to realizability for ϕ<sup>2</sup> as well: By choosing a different valuation of i in the first step, the system is forced to either react with different valuations of o (violating first conjunct), or not correctly repeating the initial value of i (violating second conjunct).

There are, however, already linear specifications where the set of counterexample paths is not finite and depends on the strategy tree [16]. For example, the specification ∀π. (i<sup>π</sup> ↔ oπ) is unrealizable as the system cannot predict future values of the environment. There is no finite set of traces witnessing this: For every finite set of traces, there is a strategy tree such that (i<sup>π</sup> ↔ oπ) holds on every such trace. On the other hand, there is a simple *counterexample strategy*, that is a strategy that observes output sequences and produces inputs. In this example, the counterexample strategy inverts the outputs given by the system, thus it is guaranteed that (i o) for any system strategy.

We combine those two approaches, selecting counterexample paths and using strategic behavior. A k-counterexample strategy for HyperLTL observes k output sequences and produces k inputs, where k is a new parameter (k ≥ n). The counterexample strategy is winning if (1) either the traces given by the system player do not correspond to a strategy, or (2) the body of the HyperLTL is violated for any n subset of the k traces. Regarding property (1), consider the two traces where the system player produces different outputs initially. Clearly, those two traces cannot be generated by any system strategy since the initial state (root labeling) is fixed.

The search for a k-counterexample strategy can be reduced to LTL synthesis using k-tuple input propositions O<sup>k</sup>, k-tuple output propositions I<sup>k</sup>, and the specification

$$\neg D\_{I^k \mapsto O^k} \lor \bigvee\_{\substack{P \subseteq \{1, \dots, k\} \text{ with } |P| = n}} \neg \psi[P] \; \; , \; \bot$$

where ψ[P] denotes the replacement of a<sup>π</sup>*<sup>i</sup>* by the Pith position of the combined input/output k-tuple.

**Theorem 7.** *A universal HyperLTL formula* ϕ = ∀π<sup>1</sup> ···∀πn. ψ *is unrealizable if there is a* k*-counterexample strategy for some* k ≥ n*.*

#### **6 Evaluation**

We implemented a prototype synthesis tool, called BoSyHyper<sup>1</sup>, for universal HyperLTL based on the bounded synthesis algorithm described in Sect. 4. Furthermore, we implemented the search for counterexamples proposed in Sect. 5. Thus, BoSyHyper is able to characterize realizability and unrealizability of universal HyperLTL formulas.

We base our implementation on the LTL synthesis tool BoSy [8]. For efficiency, we split the specifications into two parts, a part containing the linear (LTL) specification, and a part containing the hyperproperty given as HyperLTL formula. Consequently, we build two constraint systems, one using the standard bounded synthesis approach [14] and one using the approach described in Sect. 4. Before solving, those constraints are combined into a single SMT query. This results in a much more concise constraint system compared to the one where the complete specification is interpreted as a HyperLTL formula. For solving the SMT queries, we use the Z3 solver [20]. We continue by describing the benchmarks used in our experiments.

<sup>1</sup> BoSyHyper is available at https://www.react.uni-saarland.de/tools/bosy/.

(a) Non-symmetric solution (b) Counterexample to symmetry (c) Symmetry breaking solution

**Fig. 4.** Synthesized solution of the mutual exclusion protocols.

*Symmetric Mutual Exclusion.* Our first example demonstrates the ability to specify symmetry in HyperLTL for a simple mutual exclusion protocol. Let r<sup>1</sup> and r<sup>2</sup> be input signals representing mutual exclusive *requests* to a critical section and g1/g<sup>2</sup> the respective grant to enter the section. Every request should be answered eventually (r<sup>i</sup> → gi) for i ∈ {1, 2}, but not at the same time ¬(g<sup>1</sup> ∧ g2). The minimal LTL solution is depicted in Fig. 4a. It is well known that no mutex protocol can ensure perfect symmetry [19], thus when adding the symmetry constraint specified by the HyperLTL formula ∀π∀π- .(r1<sup>π</sup> r2π- ) R (g1<sup>π</sup> ↔ g2π- ) the formula becomes unrealizable. Our tool produces the counterexample shown in Fig. 4b. By adding another input signal *tie* that breaks the symmetry in case of simultaneous requests and modifying the symmetry constraint ∀π∀π- .((r1<sup>π</sup> r2π- ) <sup>∨</sup> (*tie*<sup>π</sup> <sup>¬</sup>*tie*<sup>π</sup>- ))R(g1<sup>π</sup> ↔ g2π- ) we obtain the solution depicted in Fig. 4c. We further evaluated the same properties on a version that forbids spurious grants, which are reported in Table 2 with prefix *full*.

*Distributed and Fault-Tolerant Systems.* In Sect. 3 we presented a reduction of arbitrary distributed architectures to HyperLTL. As an example for our evaluation, consider a setting with two processes, one for *encoding* input signals and one for *decoding*. Both processes can be synthesized simultaneously using a single HyperLTL specification. The (linear) correctness condition states that the decoded signal is always equal to the inputs given to the encoder. Furthermore, the encoder and decoder should solely depend on the inputs and the encoded signal, respectively. Additionally, we can specify desired properties about the encoding like fault-tolerance [16] or Hamming distance of code words [12]. The results are reported in Table 2 where i-j-x means i input bits, j encoded bits, and x represents the property. The property is either tolerance against a single Byzantine signal failure or a guaranteed Hamming distance of code words.

*CAP Theorem.* The CAP Theorem due to Brewer [2] states that it is impossible to design a distributed system that provides Consistency, Availability, and Partition tolerance (CAP) simultaneously. This example has been considered before [16] to evaluate a technique that could automatically detect unrealizability. However, when we drop either Consistency, Availability, or Partition tolerance, the corresponding instances (AP, CP, and CA) become realizable, which the previous work was not able to prove. We show that our implementation can show both, unrealizability of CAP and realizability of AP, CP, and CA. In contrast to the previous encoding [16] we are not limited to acyclic architectures.

*Long-term Information-flow.* Previous work on model-checking hyperproperties [12] found that an implementation for the commonly used *I2C* bus protocol could remember input values ad infinitum. For example, it could not be verified that information given to the implementation eventually leaves it, i.e., is forgotten. This is especially unfortunate in high security contexts. We consider a simple bus protocol which is inspired by the widely used *I2C* protocol. Our example protocol has the inputs *send* for initiating a transmission, *in* for the value that should be transferred, and an *ack*nowledgment bit indicating successful transmission. The bus master waits in an *idle* state until a *send* is received. Afterwards, it transmits a header sequence, followed by the value of *in*, waits for an acknowledgement and then indicates *success* or *failure* to the sender before returning to the idle state. We specify the property that the *in*put has no influence on the *data* that is send, which is obviously violated (instance NI1). As a second property, we check that this information leak cannot happen arbitrary long (NI2) for which there is a realizing implementation.

*Dining Cryptographers.* Recap the dining cryptographers problem introduced earlier. This benchmark is interesting as it contains two types of hyperproperties. First, there is information-flow between the three cryptographers, where some secrets (sab, sac, sbc) are shared between pairs of cryptographers. In the formalization, we have 4 entities: three processes describing the 3 cryptographers (*out*i) and one process computing the result (pg), i.e., whether the group has paid or not, from *out*i. Second, the final result should only disclose whether one of the cryptographers has paid or the NSA. This can be formalized as a indistinguishability property between different executions. For example, when we compare the two traces π and π where C<sup>a</sup> has paid on π and C<sup>b</sup> has paid on π- . Then the outputs of both have to be the same, if their common secret sab is different on those two traces (while all other secrets sac and sbc are the same). This ensures that from an outside observer, a flipped output can be either result of a different shared secret or due to the announcement. Lastly, the linear specification asserts that p<sup>g</sup> ↔ ¬p*NSA*.

*Results.* Table 2 reports on the results of the benchmarks. We distinguish between state-labeled (*Moore*) and transition-labeled (*Mealy*) transition systems. Note that the counterexample strategies use the opposite transition system, i.e., a Mealy system strategy corresponds to a state-labeled (Moore) environment strategy. Typically, Mealy strategies are more compact, i.e., need smaller transition systems and this is confirmed by our experiments. BoSyHyper is able to solve most of the examples, providing realizing implementations or counterexamples. Regrading the unrealizable benchmarks we observe that usually two simultaneously generated paths (k = 2) are enough with the exception


**Table 2.** Results of BoSyHyper on the benchmarks sets described in Sect. 6. They ran on a machine with a dual-core Core i7, 3.3 GHz, and 16 GB memory.

of the encoder example. Overall the results are encouraging showing that we can solve a variety of instances with non-trivial information-flow.

### **7 Conclusion**

In this paper, we have considered the reactive realizability problem for specifications given in the temporal logic HyperLTL. We gave a complete characterization of the decidable fragments based on the quantifier prefix and, additionally, identified a decidable fragment in the, in general undecidable, universal fragment of HyperLTL. Furthermore, we presented two algorithms to detect realizable and unrealizable HyperLTL specifications, one based on bounding the system implementation and one based on bounding the number of counterexample paths. Our prototype implementation shows that our approach is able to synthesize systems with complex information-flow properties.

#### **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

### **Reactive Control Improvisation**

Daniel J. Fremont(B) and Sanjit A. Seshia

University of California, Berkeley, USA {dfremont,sseshia}@berkeley.edu

**Abstract.** Reactive synthesis is a paradigm for automatically building correct-by-construction systems that interact with an unknown or adversarial environment. We study how to do reactive synthesis when part of the specification of the system is that its behavior should be *random*. Randomness can be useful, for example, in a network protocol fuzz tester whose output should be varied, or a planner for a surveillance robot whose route should be unpredictable. However, existing reactive synthesis techniques do not provide a way to ensure random behavior while maintaining functional correctness. Towards this end, we generalize the recently-proposed framework of *control improvisation* (CI) to add reactivity. The resulting framework of *reactive control improvisation* provides a natural way to integrate a randomness requirement with the usual functional specifications of reactive synthesis over a finite window. We theoretically characterize when such problems are realizable, and give a general method for solving them. For specifications given by reachability or safety games or by deterministic finite automata, our method yields a polynomial-time synthesis algorithm. For various other types of specifications including temporal logic formulas, we obtain a polynomial-space algorithm and prove matching PSPACE-hardness results. We show that all of these randomized variants of reactive synthesis are no harder in a complexity-theoretic sense than their non-randomized counterparts.

#### **1 Introduction**

Many interesting programs, including protocol handlers, task planners, and concurrent software generally, are *open* systems that interact over time with an external environment. Synthesis of such *reactive systems* requires finding an implementation that satisfies the desired specification no matter what the environment does. This problem, *reactive synthesis*, has a long history (see [7] for a survey). Reactive synthesis from temporal logic specifications [19] has been particularly well-studied and is being increasingly used in applications such as hardware synthesis [3] and robotic task planning [15].

In this paper, we investigate how to synthesize reactive systems with *random behavior* : in fact, systems where *being random in a prescribed way is part of their specification*. This is in contrast to prior work on stochastic games where randomness is used to model uncertain environments or randomized strategies are merely allowed, not required. Solvers for stochastic games may incidentally produce randomized strategies to satisfy a functional specification (and some types of specification, e.g. multi-objective queries [4], may only be realizable by randomized strategies), but do not provide a general way to *enforce* randomness. Unlike most specifications used in reactive synthesis, our randomness requirement is a property of a system's *distribution* of behaviors, not of an individual behavior. While probabilistic specification languages like PCTL [12] can capture some such properties, the simple and natural randomness requirement we study here cannot be concisely expressed by existing languages (even those as powerful as SGL [2]). Thus, *randomized reactive synthesis* in our sense requires significantly different methods than those previously studied.

However, we argue that this type of synthesis is quite useful, because introducing randomness into the behavior of a system can often be beneficial, enhancing *variety*, *robustness*, and *unpredictability*. Example applications include:


Adding randomness to a system in an *ad hoc* way could easily compromise its correctness. This paper shows how a randomness requirement can be integrated *into the synthesis process*, ensuring correctness as well as allowing trade-offs to be explored: how much randomness can be added while staying correct, or how strong can a specification be while admitting a desired amount of randomness?

To formalize randomized reactive synthesis we build on the idea of *control improvisation*, introduced in [6], formalized in [9], and further generalized in [8]. Control improvisation (CI) is the problem of constructing an *improviser*, a probabilistic algorithm which generates finite words subject to three constraints: a *hard constraint* that must always be satisfied, a *soft constraint* that need only be satisfied with some probability, and a *randomness constraint* that no word be generated with probability higher than a given bound. We define *reactive control improvisation* (RCI), where the improviser generates a word incrementally, alternating adding symbols with an adversarial environment. To perform synthesis in a finite window, we encode functional specifications and environment assumptions into the hard constraint, while the soft and randomness constraints allow us to tune how randomness is added to the system. The improviser obtained by solving the RCI problem is then a solution to the original synthesis problem.

The difficulty of solving reactive CI problems depends on the type of specification. We study several types commonly used in reactive synthesis, including reachability games (and variants, e.g. safety games) and formulas in the temporal logics LTL and LDL [5,18]. We also investigate the specification types studied in [8], showing how the complexity of the CI problem changes when adding reactivity. For every type of specification we obtain a randomized synthesis algorithm whose complexity matches that of ordinary reactive synthesis (in a finite window). This suggests that reactive control improvisation should be feasible in applications like robotic task planning where reactive synthesis tools have proved effective.

In summary, the main contributions of this paper are:


Finally, Sect. 8 summarizes our results and gives directions for future work.

#### **2 Background**

#### **2.1 Notation**

Given an alphabet Σ, we write |w| for the length of a finite word w ∈ Σ∗, λ for the empty word, Σ<sup>n</sup> for the words of length <sup>n</sup>, and Σ≤<sup>n</sup> for <sup>∪</sup><sup>0</sup>≤i≤<sup>n</sup>Σ<sup>i</sup> , the set of all words of length at most n. We abbreviate deterministic/nondeterministic finite automaton by DFA/NFA, and context-free grammar by CFG. For an instance X of any such formalism, which we call a *specification*, we write L(X ) for the language (subset of Σ∗) it defines (note the distinction between a language and a representation thereof). We view formulas of Linear Temporal Logic (LTL) [18] and Linear Dynamic Logic (LDL) [5] as specifications using their natural semantics on finite words (see [5]).

We use the standard complexity classes #P and PSPACE, and the PSPACEcomplete problem QBF of determining the truth of a quantified Boolean formula. For background on these classes and problems see for example [1].

Some specifications we use as examples are *reachability games* [16], where players' actions cause transitions in a state space and the goal is to reach a target state. We group these games, *safety games* where the goal is to *avoid* a set of states, and *reach-avoid* games combining reachability and safety goals [20], together as *reachability/safety games* (RSGs). We draw reachability games as graphs in the usual way: squares are adversary-controlled states, and states with a double border are target states.

#### **2.2 Synthesis Games**

Reactive control improvisation will be formalized in terms of a 2-player game which is essentially the standard *synthesis game* used in reactive synthesis [7]. However, our formulation is slightly different for compatibility with the definition of control improvisation, so we give a self-contained presentation here.

Fix a finite alphabet Σ. The players of the game will alternate picking symbols from Σ, building up a word. We can then specify the set of winning plays with a language over Σ. To simplify our presentation we assume that players strictly alternate turns and that any symbol from Σ is a legal move. These assumptions can be relaxed in the usual way by modifying the winning set appropriately.

**Finite Words:** While reactive synthesis is usually considered over infinite words, in this paper we focus on synthesis in a finite window, as it is unclear how best to generalize our randomness requirement to the infinite case. This assumption is not too restrictive, as solutions of bounded length are adequate for many applications. In fuzz testing, for example, we do not want to generate arbitrarily long files or sequences of packets. In robotic planning, we often want a plan that accomplishes a task within a certain amount of time. Furthermore, planning problems with liveness specifications can often be segmented into finite pieces: we do not need an infinite route for a patrolling robot, but can plan within a finite horizon and replan periodically. Replanning may even be *necessary* when environment assumptions become invalid. At any rate, we will see that the bounded case of reactive control improvisation is already highly nontrivial.

As a final simplification, we require that all plays have length exactly <sup>n</sup> <sup>∈</sup> <sup>N</sup>. To allow a range [m, n] we can simply add a new padding symbol to Σ and extend all shorter words to length n, modifying the winning set appropriately.

**Definition 2.1.** *A* history h *is an element of* Σ≤<sup>n</sup>*, representing the moves of the game played so far. We say the game has* ended *after* h *if* |h| = n*; otherwise it is* our turn *after* h *if* |h| *is even, and* the adversary's turn *if* |h| *is odd.*

**Definition 2.2.** *<sup>A</sup>* strategy *is a function* <sup>σ</sup> : Σ≤<sup>n</sup> <sup>×</sup><sup>Σ</sup> <sup>→</sup> [0, 1] *such that for any history* <sup>h</sup> <sup>∈</sup> <sup>Σ</sup>≤<sup>n</sup> *with* <sup>|</sup>h<sup>|</sup> < n*,* <sup>σ</sup>(h, ·) *is a probability distribution over* <sup>Σ</sup>*. We write* x ← σ(h) *to indicate that* x *is a symbol randomly drawn from* σ(h, ·)*.*

Since strategies are randomized, fixing strategies for both players does not uniquely determine a play of the game, but defines a *distribution* over plays:

**Definition 2.3.** *Given a pair of strategies* (σ, τ )*, we can generate a random* play <sup>π</sup> <sup>∈</sup> <sup>Σ</sup><sup>n</sup> *as follows. Pick* <sup>π</sup><sup>0</sup> <sup>←</sup> <sup>σ</sup>(λ)*, then for* <sup>i</sup> *from* <sup>1</sup> *to* <sup>n</sup> <sup>−</sup> <sup>1</sup> *pick* π<sup>i</sup> ← τ (π<sup>0</sup> ...π<sup>i</sup>−<sup>1</sup>) *if* i *is odd and* π<sup>i</sup> ← σ(π<sup>0</sup> ...π<sup>i</sup>−<sup>1</sup>) *otherwise. Finally, put* π = π<sup>0</sup> ...π<sup>n</sup>−<sup>1</sup>*. We write* Pσ,τ (π) *for the probability of obtaining the play* π*. This extends to a* set *of plays* <sup>X</sup> <sup>⊆</sup> <sup>Σ</sup><sup>n</sup> *in the natural way:* <sup>P</sup>σ,τ (X) = - <sup>π</sup>∈<sup>X</sup> <sup>P</sup>σ,τ (π)*. Finally, the set of* possible *plays is* <sup>Π</sup>σ,τ <sup>=</sup> {<sup>π</sup> <sup>∈</sup> <sup>Σ</sup><sup>n</sup> <sup>|</sup> <sup>P</sup>σ,τ (π) <sup>&</sup>gt; <sup>0</sup>}*.*

The next definition is just the conditional probability of a play given a history, but works for histories with probability zero, simplifying our presentation.

**Definition 2.4.** *For any history* <sup>h</sup> <sup>=</sup> <sup>h</sup><sup>0</sup> ...hk−<sup>1</sup> <sup>∈</sup> <sup>Σ</sup>≤<sup>n</sup> *and word* <sup>ρ</sup> <sup>∈</sup> <sup>Σ</sup>n−k*, we write* Pσ,τ (ρ|h) *for the probability that if we assign* π<sup>i</sup> = h<sup>i</sup> *for* i<k *and sample* πk,...,πn−<sup>1</sup> *by the process above, then* π<sup>k</sup> ...πn−<sup>1</sup> = ρ*.*

#### **3 Problem Definition**

#### **3.1 Motivating Example**

Consider synthesizing a planner for a surveillance drone operating near another, potentially adversarial drone. Discretizing the map into the 7 × 7 grid in Fig. 1 (ignoring the depicted trajectories for the moment), a route is a word over the four movement directions. Our specification is to visit the 4 circled locations in 30 moves without colliding with the adversary, assuming it cannot move into the 5 highlighted central locations.

**Fig. 1.** Improvised trajectories for a patrolling drone (solid) avoiding an adversary (dashed). The adversary may not move into the circles or the square.

Existing reactive synthesis tools can produce a strategy for the patroller ensuring that the specification is always satisfied. However, the strategy may be deterministic, so that in response to a fixed adversary the patroller will always follow the same route. Then it is easy for a third party to predict the route, which could be undesirable, and is in fact unnecessary if there are many other ways the drone can satisfy its specification.

Reactive control improvisation addresses this problem by adding a new type of specification to the *hard constraint* above: a *randomness requirement* stating that no behavior should be generated with probability greater than a threshold ρ. If we set (say) ρ = 1/5, then any controller solving the synthesis problem must be able to satisfy the hard constraint in at least 5 different ways, never producing any given behavior more than 20% of the time. Our synthesis algorithm can in fact compute the smallest ρ for which synthesis is possible, yielding a controller that is *maximally-randomized* in that the system's behavior is as close to a uniform distribution as possible.

To allow finer tuning of how randomness is introduced into the controller, our definition also includes a *soft constraint* which need only be satisfied with some probability 1−. This allows us to prefer certain safe behaviors over others. In our drone example, we require that with probability at least 3/4, we do not visit a circled location twice.

These hard, soft, and randomness constraints form an instance of our reactive control improvisation problem. Encoding the hard and soft constraints as DFAs, our algorithm (Sect. 6) produced a controller achieving the smallest realizable <sup>ρ</sup> = 2.<sup>2</sup> <sup>×</sup> <sup>10</sup>−<sup>12</sup>. We tested the controller using the PX4 autopilot [17] to refine the generated routes into control actions for a drone simulated in Gazebo [14] (videos and code are available online [11]). A selection of resulting trajectories are shown in Fig. 1 (the remainder in Appendix A of the full paper [10] ): starting from the triangles, the patroller's path is solid, the adversary's dashed. The left run uses an adversary that moves towards the patroller when possible. The right runs, with a simple adversary moving in a fixed loop, illustrate the randomness of the synthesized controller.

#### **3.2 Reactive Control Improvisation**

Our formal notion of randomized reactive synthesis in a finite window is a reactive extension of *control improvisation* [8,9], which captures the three types of constraint (hard, soft, randomness) seen above. We use the notation of [8] for the specifications and languages defining the hard and soft constraints:

**Definition 3.1 (**[8]**).** *Given* hard *and* soft *specifications* H *and* S *of languages over* <sup>Σ</sup>*, an* improvisation *is a word* <sup>w</sup> <sup>∈</sup> <sup>L</sup>(H)∩Σ<sup>n</sup>*. It is* admissible *if* <sup>w</sup> <sup>∈</sup> <sup>L</sup>(S)*. The set of all improvisations is denoted* I*, and admissible improvisations* A*.*

*Running Example.* We will use the following simple example throughout the paper: each player may increment (+), decrement (−), or leave unchanged (=) a counter which is initially zero. The alphabet is Σ = {+, −, =}, and we set n = 4. The hard specification H is the DFA in Fig. 2 requiring that the counter stay within [−2, 2]. The soft specification S is a similar DFA requiring that the counter end at a nonnegative value.

Then for example the word ++== is an admissible improvisation, satisfying both hard and soft constraints, and so is in A. The word +−=− on the other hand satisfies H but not S, so it is in I but not A. Finally, +++− does not satisfy H, so it is not an improvisation at all and is not in I.

A reactive control improvisation problem is defined by H, S, and parameters and ρ. A solution is then a strategy which ensures that the hard, soft, and randomness constraints hold against every adversary. Formally, following [8,9]:

**Fig. 2.** The hard specification DFA H in our running example. The soft specification S is the same but with only the shaded states accepting.

**Definition 3.2.** *Given an* RCI instance C = (H, S, n, , ρ) *with* H*,* S*, and* n *as above and* , ρ <sup>∈</sup> [0, 1] <sup>∩</sup> <sup>Q</sup>*, a strategy* <sup>σ</sup> *is an* improvising strategy *if it satisfies the following requirements for every adversary* τ *:*

**Hard constraint:** Pσ,τ (I)=1 **Soft constraint:** Pσ,τ (A) ≥ 1 − **Randomness:** ∀π ∈ I*,* Pσ,τ (π) ≤ ρ*.*

*If there is an improvising strategy* σ*, we say that* C *is* realizable*. An* improviser *for* C *is then an expected-finite time probabilistic algorithm implementing such a strategy* <sup>σ</sup>*, i.e. whose output distribution on input* <sup>h</sup> <sup>∈</sup> <sup>Σ</sup>≤<sup>n</sup> *is* <sup>σ</sup>(h, ·)*.*

**Definition 3.3.** *Given an RCI instance* C = (H, S, n, , ρ)*, the* reactive control improvisation *(RCI) problem is to decide whether* C *is realizable, and if so to generate an improviser for* C*.*

*Running Example.* Suppose we set = 1/2 and ρ = 1/2. Let σ be the strategy which picks + or − with equal probability in the first move, and thenceforth picks the action which moves the counter closest to ±1 respectively. This satisfies the hard constraint, since if the adversary ever moves the counter to ±2 we immediately move it back. The strategy also satisfies the soft constraint, since with probability 1/2 we set the counter to +1 on the first move, and if the adversary moves to 0 we move back to +1 and remain nonnegative. Finally, σ also satisfies the randomness constraint, since each choice of first move happens with probability 1/2 and so no play can be generated with higher probability. So σ is an improvising strategy and this RCI instance is realizable.

We will study classes of RCI problems with different types of specifications:

**Definition 3.4.** *If* HSpec *and* SSpec *are classes of specifications, then the class of RCI instances* C = (H, S, n, , ρ) *where* H ∈ HSpec *and* S ∈ SSpec *is denoted* RCI (HSpec, SSpec)*. We use the same notation for the decision problem associated with the class, i.e., given* C ∈ RCI (HSpec, SSpec)*, decide whether* C *is realizable. The* size |C| *of an RCI instance is the total size of the bit representations of its parameters, with* n *represented in unary and* , ρ *in binary.*

Finally, a *synthesis algorithm* in our context takes a specification in the form of an RCI instance and produces an implementation in the form of an improviser. This corresponds exactly to the notion of an improvisation scheme from [8]:

**Definition 3.5 (**[8]**).** *A* polynomial-time improvisation scheme *for a class* P *of RCI instances is an algorithm* S *with the following properties:*


The first two requirements simply say that the scheme produces valid improvisers in polynomial time. The third is necessary to ensure that the improvisers themselves are efficient: otherwise, the scheme might for example produce improvisers running in time exponential in the size of the specification.

A main goal of our paper is to determine for which types of specifications there exist polynomial-time improvisation schemes. While we do find such algorithms for important classes of specifications, we will also see that determining the realizability of an RCI instance is often PSPACE-hard. Therefore we also consider *polynomial-space improvisation schemes*, defined as above but replacing time with space.

#### **4 Existence of Improvisers**

#### **4.1 Width and Realizability**

The most basic question in reactive synthesis is whether a specification is realizable. In *randomized* reactive synthesis, the question is more delicate because the randomness requirement means that it is no longer enough to ensure some property regardless of what the adversary does: there must be *many ways* to do so. Specifically, there must be at least 1/ρ improvisations if we are to generate each of them with probability at most ρ. Furthermore, at least this many improvisations must be *possible* given an unknown adversary: even if many exist, the adversary may be able to force us to use only a single one. We introduce a new notion of the size of a set of plays that takes this into account.

**Definition 4.1.** *The* width *of* <sup>X</sup> <sup>⊆</sup> <sup>Σ</sup><sup>n</sup> *is* <sup>W</sup>(X) = max<sup>σ</sup> min<sup>τ</sup> <sup>|</sup><sup>X</sup> <sup>∩</sup> <sup>Π</sup>σ,τ <sup>|</sup>*.*

The width counts how many distinct plays can be generated regardless of what the adversary does. Intuitively, a "narrow" game—one whose set of winning plays has small width—is one in which the adversary can force us to choose among only a few winning plays, while in a "wide" one we always have many safe choices available. Note that *which* particular plays can be generated depends on the adversary: the width only measures *how many* can be generated. For example, W(X) = 1 means that a play in X can always be generated, but possibly a different element of X for different adversaries.

**Fig. 3.** Synthesis game for our running example. States are labeled with the widths of I (left) and A (right) given a history ending at that state.

*Running Example.* Figure 3 shows the synthesis game for our running example: paths ending in circled or shaded states are plays in I or A respectively (ignore the state labels for now). At left, the bold arrows show the 4 plays in I possible against the adversary that moves away from 0, and down at 0. This shows W(I) ≤ 4, and in fact 4 plays are possible against any adversary, so W(I) = 4. Similarly, at right we see that W(A) = 1.

It will be useful later to have a *relative* version of width that counts how many plays are possible *from a given position*:

**Definition 4.2.** *Given a set of plays* <sup>X</sup> <sup>⊆</sup> <sup>Σ</sup><sup>n</sup> *and a history* <sup>h</sup> <sup>∈</sup> <sup>Σ</sup>≤<sup>n</sup>*, the* width of X given h *is* W(X|h) = max<sup>σ</sup> min<sup>τ</sup> |{π | hπ ∈ X ∧ Pσ,τ (π|h) > 0}|*.*

This is a direct generalization of "winning" positions: if X is the set of winning plays, then W(X|h) counts the number of ways to win from h.

We will often use the following basic properties of W(X|h) without comment (for lack of space this proof and the details of later proof sketches are deferred to Appendix B of the full paper [10]). Note that (3)–(5) provide a recursive way to compute widths that we will use later, and which is illustrated by the state labels in Fig. 3.

**Lemma 4.1.** *For any set of plays* <sup>X</sup> <sup>⊆</sup> <sup>Σ</sup><sup>n</sup> *and history* <sup>h</sup> <sup>∈</sup> <sup>Σ</sup>≤<sup>n</sup>*:*


Now we can state the realizability conditions, which are simply that I and A have sufficiently large width. In fact, the conditions turn out to be exactly the same as those for non-reactive CI except that width takes the place of size [9].

**Theorem 4.1.** *The following are equivalent:*

*(1)* C *is realizable.*

*(2)* W(I) ≥ 1/ρ *and* W(A) ≥ (1 − )/ρ*.*

*(3) There is an improviser for* C*.*

*Running Example.* We saw above that our example was realizable with = ρ = 1/2, and indeed 4 = W(I) ≥ 1/ρ = 2 and 1 = W(A) ≥ (1−)/ρ = 1. However, if we put ρ = 1/3 we violate the second inequality and the instance is not realizable: essentially, we need to distribute probability 1 − = 1/2 among plays in A (to satisfy the soft constraint), but since W(A) = 1, against some adversaries we can only generate one play in A and would have to give it the whole 1/2 (violating the randomness requirement).

The difficult part of the Theorem is constructing an improviser when the inequalities (2) hold. Despite the similarity in these conditions to the nonreactive case, the construction is much more involved. We begin with a general overview.

#### **4.2 Improviser Construction: Discussion**

Our improviser can be viewed as an extension of the classical random-walk reduction of uniform sampling to counting [21]. In that algorithm (which was used in a similar way for DFA specifications in [8,9]), a uniform distribution over paths in a DAG is obtained by moving to the next vertex with probability proportional to the number of paths originating at it. In our case, which plays are possible depends on the adversary, but the width still tells us *how many* plays are possible. So we could try a random walk using widths as weights: e.g. on the first turn in Fig. 3, picking +, −, and = with probabilities 1/4, 2/4, and 1/4 respectively. Against the adversary shown in Fig. 3, this would indeed yield a uniform distribution over the four possible plays in I.

However, the soft constraint may require a non-uniform distribution. In the running example with = ρ = 1/2, we need to generate the single possible play in A with probability 1/2, not just the uniform probability 1/4 . This is easily fixed by doing the random walk with a *weighted average* of the widths of I and A: specifically, move to position h with probability proportional to αW(A|h) + β(W(I|h) − W(A|h)). In the example, this would result in plays in A getting probability α and those in I \ A getting probability β. Taking α sufficiently large, we can ensure the soft constraint is satisfied.

Unfortunately, this strategy can fail if the adversary makes *more* plays available than the width guarantees. Consider the game on the left of Fig. 4, where W(I) = 3 and W(A) = 2. This is realizable with = ρ = 1/3, but no values of α and β yield improvising strategies, essentially because an adversary moving from X to Z breaks the worst-case assumption that the adversary will minimize the number of possible plays by moving to Y . In fact, this instance is realizable but not by any memoryless strategy. To see this, note that all such strategies can be parametrized by the probabilities p and q in Fig. 4. To satisfy the randomness

**Fig. 4.** Reachability games where a na¨ıve random walk, and all memoryless strategies, fail (left) and where no strategy can optimize either or ρ against every adversary simultaneously (right).

constraint against the adversary that moves from X to Y , both p and (1 − p)q must be at most 1/3. To satisfy the soft constraint against the adversary that moves from X to Z we must have pq + (1 − p)q ≥ 2/3, so q ≥ 2/3. But then (1 − p)q ≥ (1 − 1/3)(2/3) = 4/9 > 1/3, a contradiction.

To fix this problem, our improvising strategy ˆσ (which we will fully specify in Algorithm 1 below) takes a simplistic approach: it tracks how many plays in A and I are expected to be possible based on their widths, and if more are available it ignores them. For example, entering state Z from X there are 2 ways to produce a play in I, but since W(I|X) = 1 we ignore the play in I \ A. Extra plays in A are similarly ignored by being treated as members of I \ A. Ignoring unneeded plays may seem wasteful, but the proof of Theorem 4.1 will show that σˆ nevertheless achieves the best possible :

**Corollary 4.1.** C *is realizable iff* W(I) ≥ 1/ρ *and* ≥ opt ≡ max(1 − ρW(A), 0)*. Against any adversary, the error probability of Algorithm 1 is at most* opt*.*

Thus, if *any* improviser can achieve an error probability , ours does. We could ask for a stronger property, namely that against each adversary the improviser achieves the smallest possible error probability *for that adversary*. Unfortunately, this is impossible in general. Consider the game on the right in Fig. 4, with ρ = 1. Against the adversary which always moves up, we can achieve = 0 with the strategy that at P moves to Q. We can also achieve = 0 against the adversary that always moves down, but only with a *different* strategy, namely the one that at P moves to R. So there is no single strategy that achieves the optimal for every adversary. A similar argument shows that there is also no strategy achieving the smallest possible ρ for every adversary. In essence, optimizing or ρ in every case would require the strategy to depend on the adversary.

#### **4.3 Improviser Construction: Details**

Our improvising strategy, as outlined in the previous section, is shown in Algorithm 1. We first compute α and β, the (maximum) probabilities for generating elements of A and I \ A respectively. As in [8], we take α as large as possible given α ≤ ρ, and determine β from the probability left over (modulo a couple corner cases).

#### **Algorithm 1.** the strategy ˆσ

1: α ← min(ρ, 1/W(A)) (or 0 instead if W(A) = 0) 2: β ← (1 − αW(A))/(W(I) − W(A)) (or 0 instead if W(I) − W(A) = 0) 3: m*<sup>A</sup>* ← W(A), m*<sup>I</sup>* ← W(I) 4: h ← λ 5: **while** the game is not over after h **do** 6: **if** it is our turn after h **then** 7: m*<sup>A</sup> u* , m*<sup>I</sup> u* <sup>←</sup> Partition(m*A*, m*<sup>I</sup>* , h) returns values for each <sup>u</sup> <sup>∈</sup> <sup>Σ</sup> 8: for each <sup>u</sup> <sup>∈</sup> Σ, put <sup>t</sup>*u* <sup>←</sup> αm*<sup>A</sup> u* <sup>+</sup> <sup>β</sup>(m*<sup>I</sup> u* <sup>−</sup> <sup>m</sup>*<sup>A</sup> u* ) 9: pick <sup>u</sup> <sup>∈</sup> Σ with probability proportional to <sup>t</sup>*u* and append it to <sup>h</sup> 10: m*<sup>A</sup>* ← m*<sup>A</sup> u* , <sup>m</sup>*<sup>I</sup>* <sup>←</sup> <sup>m</sup>*<sup>I</sup> u* 11: **else** 12: the adversary picks u ∈ Σ given the history h; append it to h **return** h

**Fig. 5.** A run of Algorithm 1, labeling states with corresponding widths of I (left) and A (right).

Next we initialize m<sup>A</sup> and m<sup>I</sup> , our expectations for how many plays in A and I respectively are still possible to generate. Initially these are given by W(A) and W(I), but as we saw above it is possible for more plays to become available. The function Partition handles this, deciding which m<sup>A</sup> (resp., m<sup>I</sup> ) out of the available W(A|h) (W(I|h)) plays we will use. The behavior of Partition is defined by the following lemma; its proof (in Appendix B [10]) greedily takes the first <sup>m</sup><sup>A</sup> possible plays in <sup>A</sup> under some canonical order and the first <sup>m</sup><sup>I</sup> <sup>−</sup> <sup>m</sup><sup>A</sup> of the remaining plays in I.

**Lemma 4.2.** *If it is our turn after* <sup>h</sup> <sup>∈</sup> <sup>Σ</sup>≤<sup>n</sup>*, and* <sup>m</sup><sup>A</sup>, m<sup>I</sup> <sup>∈</sup> <sup>Z</sup> *satisfy* <sup>0</sup> <sup>≤</sup> <sup>m</sup><sup>A</sup> <sup>≤</sup> <sup>m</sup><sup>I</sup> <sup>≤</sup> <sup>W</sup>(I|h) *and* <sup>m</sup><sup>A</sup> <sup>≤</sup> <sup>W</sup>(A|h)*, there are integer partitions* - <sup>u</sup>∈<sup>Σ</sup> <sup>m</sup><sup>A</sup> u *and* - <sup>u</sup>∈<sup>Σ</sup> <sup>m</sup><sup>I</sup> <sup>u</sup> *of* <sup>m</sup><sup>A</sup> *and* <sup>m</sup><sup>I</sup> *respectively such that* <sup>0</sup> <sup>≤</sup> <sup>m</sup><sup>A</sup> <sup>u</sup> <sup>≤</sup> <sup>m</sup><sup>I</sup> <sup>u</sup> ≤ W(I|hu) *and* m<sup>A</sup> <sup>u</sup> ≤ W(A|hu) *for all* u ∈ Σ*. These are computable in poly-time given oracles for* W(I|·) *and* W(A|·)*.*

Finally, we perform the random walk, moving from position h to hu with (unnormalized) probability tu, the weighted average described above.

*Running Example.* With = ρ = 1/2, as before W(A) = 1 and W(I)=4 so <sup>α</sup> = 1/2 and <sup>β</sup> = 1/6. On the first move, <sup>m</sup><sup>A</sup> and <sup>m</sup><sup>I</sup> match <sup>W</sup>(A|h) and W(I|h), so all plays are used and Partition returns (W(A|hu), W(I|hu)) for each <sup>u</sup> <sup>∈</sup> Σ. Looking up these values in Fig. 5, we see (m<sup>A</sup> =, m<sup>I</sup> <sup>=</sup>) = (0, 2) and so t(=) = 2β = 1/3. Similarly t(+) = α = 1/2 and t(−) = β = 1/6. We choose an action according to these weights; suppose =, so that we update <sup>m</sup><sup>A</sup> <sup>←</sup> 0 and <sup>m</sup><sup>I</sup> <sup>←</sup> 2, and suppose the adversary responds with =. From Fig. 5, <sup>W</sup>(A<sup>|</sup> ==) = 1 and <sup>W</sup>(I<sup>|</sup> ==) = 3, whereas <sup>m</sup><sup>A</sup> = 0 and <sup>m</sup><sup>I</sup> = 2. So Partition discards a play, say returning (m<sup>A</sup> <sup>u</sup> , m<sup>I</sup> <sup>u</sup>) = (0, 1) for u ∈ {+, =} and (0, 0) for u ∈ {−}. Then t(+) = t(=) = β = 1/6 and t(−) = 0. So we pick + or = with equal probability, say +. If the adversary responds with +, we get the play ==++, shown in bold on Fig. 5. As desired, it satisfies the hard constraint.

The next few lemmas establish that ˆσ is well-defined and in fact an improvising strategy, allowing us to prove Theorem 4.1. Throughout, we write m<sup>A</sup>(h) (resp., m<sup>I</sup> (h)) for the value of m<sup>A</sup> (m<sup>I</sup> ) at the start of the iteration for history <sup>h</sup>. We also write <sup>t</sup>(h) = αm<sup>A</sup>(h) + <sup>β</sup>(m<sup>I</sup> (h) <sup>−</sup> <sup>m</sup><sup>A</sup>(h)) (so <sup>t</sup>(hu) = <sup>t</sup><sup>u</sup> when we pick u).

**Lemma 4.3.** *If* W(I) ≥ 1/ρ*, then* σˆ *is a well-defined strategy and* Pσ,τ <sup>ˆ</sup> (I)=1 *for every adversary* τ *.*

*Proof (sketch).* An easy induction on h shows the conditions of Lemma 4.2 are always satisfied, and that t(h) is always positive since we never pick a u with t<sup>u</sup> = 0. So - <sup>u</sup> t<sup>u</sup> = t(h) > 0 and ˆσ is well-defined. Furthermore, t(h) > 0 implies <sup>m</sup><sup>I</sup> (h) <sup>&</sup>gt; 0, so for any <sup>h</sup> <sup>∈</sup> <sup>Π</sup>σ,τ <sup>ˆ</sup> we have <sup>1</sup><sup>h</sup>∈<sup>I</sup> <sup>=</sup> <sup>W</sup>(I|h) <sup>≥</sup> <sup>m</sup><sup>I</sup> (h) <sup>&</sup>gt; 0 and thus h ∈ I.

**Lemma 4.4.** *If* W(I) ≥ 1/ρ*, then* Pσ,τ <sup>ˆ</sup> (A) ≥ min(ρW(A), 1) *for every* τ *.*

*Proof (sketch).* Because of the αm<sup>A</sup>(h) term in the weights t(h), the probability of obtaining a play in A starting from h is at least αm<sup>A</sup>(h)/t(h) (as can be seen by induction on h in order of decreasing length). Then since m<sup>A</sup>(λ) = W(A) and t(λ) = 1 we have Pσ,τ <sup>ˆ</sup> (A) ≥ αW(A) = min(ρW(A), 1).

**Lemma 4.5.** *If* <sup>W</sup>(I) <sup>≥</sup> <sup>1</sup>/ρ*, then* <sup>P</sup>σ,τ <sup>ˆ</sup> (π) <sup>≤</sup> <sup>ρ</sup> *for every* <sup>π</sup> <sup>∈</sup> <sup>Σ</sup><sup>n</sup> *and* <sup>τ</sup> *.*

*Proof (sketch).* If the adversary is deterministic, the weights we use for our random walk yield a distribution where each play π has probability either α or β (depending on whether m<sup>A</sup>(π) = 1 or 0). If the adversary assigns nonzero probability to multiple choices this only decreases the probability of individual plays. Finally, since W(I) ≥ 1/ρ we have α, β ≤ ρ.

*Proof (of Theorem* 4.1*).* We use a similar argument to that of [8].


**(3)**⇒**(1)** Immediate.

*Proof (of Corollary* 4.1*).* The inequalities in the statement are equivalent to those of Theorem 4.1 (2). By Lemma 4.4, we have Pσ,τ <sup>ˆ</sup> (A) ≥ min(ρW(A), 1). So the error probability is at most 1 − min(ρW(A), 1) = opt.

#### **5 A Generic Improviser**

We now use the construction of Sect. 4 to develop a generic improvisation scheme usable with any class of specifications Spec supporting the following operations:

**Intersection:** Given specs X and Y, find Z such that L(Z) = L(X ) ∩ L(Y).

**Width Measurement:** Given a specification <sup>X</sup> , a length <sup>n</sup> <sup>∈</sup> <sup>N</sup> in unary, and a history <sup>h</sup> <sup>∈</sup> <sup>Σ</sup>≤<sup>n</sup>, compute <sup>W</sup>(X|h) where <sup>X</sup> <sup>=</sup> <sup>L</sup>(<sup>X</sup> ) <sup>∩</sup> <sup>Σ</sup><sup>n</sup>.

Efficient algorithms for these operations lead to efficient improvisation schemes:

**Theorem 5.1.** *If the operations on* Spec *above take polynomial time (resp. space), then* RCI (Spec, Spec) *has a polynomial-time (space) improvisation scheme.*

*Proof.* Given an instance C = (H, S, n, , ρ) in RCI (Spec, Spec), we first apply intersection to <sup>H</sup> and <sup>S</sup> to obtain A ∈ Spec such that <sup>L</sup>(A) <sup>∩</sup> <sup>Σ</sup><sup>n</sup> <sup>=</sup> <sup>A</sup>. Since intersection takes polynomial time (space), A has size polynomial in |C|. Next we use width measurement to compute <sup>W</sup>(I) = <sup>W</sup>(L(H) <sup>∩</sup> <sup>Σ</sup><sup>n</sup>|λ) and <sup>W</sup>(A) = <sup>W</sup>(L(A) <sup>∩</sup> <sup>Σ</sup><sup>n</sup>|λ). If these violate the inequalities in Theorem 4.1, then C is not realizable and we return ⊥. Otherwise C is realizable, and ˆσ above is an improvising strategy. Furthermore, we can construct an expected finite-time probabilistic algorithm implementing ˆσ, using width measurement to instantiate the oracles needed by Lemma 4.2. Determining m<sup>A</sup>(h) and m<sup>I</sup> (h) takes O(n) invocations of Partition, each of which is poly-time relative to the width measurements. These take time (space) polynomial in |C|, since H and A have size polynomial in |C|. As <sup>m</sup><sup>A</sup>, m<sup>I</sup> ≤ |Σ<sup>|</sup> <sup>n</sup>, they have polynomial bitwidth and so the arithmetic required to compute t<sup>u</sup> for each u ∈ Σ takes polynomial time. Therefore the total expected runtime (space) of the improviser is polynomial.

Note that as a byproduct of testing the inequalities in Theorem 4.1, our algorithm can compute the best possible error probability opt given H, S, and ρ (see Corollary 4.1). Alternatively, given , we can compute the best possible ρ.

We will see below how to efficiently compute widths for DFAs, so Theorem 5.1 yields a polynomial-time improvisation scheme. If we allow polynomial*space* schemes, we can use a general technique for width measurement that only requires a very weak assumption on the specifications, namely testability in polynomial space:

**Theorem 5.2.** RCI (PSA, PSA) *has a polynomial-space improvisation scheme, where* PSA *is the class of polynomial-space decision algorithms.*

*Proof (sketch).* We apply Theorem 5.1, computing widths recursively using Lemmas 4.1, (3)–(5). As in the PSPACE QBF algorithm, the current path in the recursive tree and required auxiliary storage need only polynomial space.

#### **6 Reachability Games and DFAs**

Now we develop a polynomial-time improvisation scheme for RCI instances with DFA specifications. This also provides a scheme for reachability/safety games, whose winning conditions can be straightforwardly encoded as DFAs.

Suppose D is a DFA with states V , accepting states T, and transition function δ : V × Σ → V . Our scheme is based on the fact that W(L(D)|h) depends only on the state of D reached on input h, allowing these widths to be computed by dynamic programming. Specifically, for all v ∈ V and i ∈ {0,...,n} we define:

$$C(v, i) = \begin{cases} \mathbb{1}\_{v \in T} & i = n \\ \min\_{u \in \Sigma} \ C(\delta(v, u), i + 1) & i < n \land i \text{ odd} \\ \sum\_{u \in \Sigma} C(\delta(v, u), i + 1) & \text{otherwise} \end{cases}$$

*Running Example.* Figure 6 shows the values C(v, i) in rows from i = n downward. For example, i = 2 is our turn, so C(1, 2) = C(0, 3) + C(1, 3) + C(2, 3) = 1+1+0 = 2, while i = 3 is the adversary's turn, so C(−3, 3) = min{C(−3, 4)} = min{0} = 0. Note that the values in Fig. 6 agree with the widths W(I|h) shown in Fig. 5.

**Lemma 6.1.** *For any history* <sup>h</sup> <sup>∈</sup> <sup>Σ</sup>≤<sup>n</sup>*, writing* <sup>X</sup> <sup>=</sup> <sup>L</sup>(D) <sup>∩</sup> <sup>Σ</sup><sup>n</sup> *we have* W(X|h) = C(D(h), |h|)*, where* D(h) *is the state reached by running* D *on* h*.*

*Proof.* We prove this by induction on i = |h| in decreasing order. In the base case <sup>i</sup> <sup>=</sup> <sup>n</sup>, we have <sup>W</sup>(X|h) = <sup>1</sup><sup>h</sup>∈<sup>X</sup> <sup>=</sup> <sup>1</sup>D(h)∈<sup>T</sup> <sup>=</sup> <sup>C</sup>(D(h), n). Now take any history <sup>h</sup> <sup>∈</sup> <sup>Σ</sup>≤<sup>n</sup> with <sup>|</sup>h<sup>|</sup> <sup>=</sup> i<n. By hypothesis, for any <sup>u</sup> <sup>∈</sup> Σ we have <sup>W</sup>(X|hu) = C(D(hu), i + 1). If it is our turn after h, then W(X|h) = - <sup>u</sup>∈<sup>Σ</sup> <sup>W</sup>(X|hu) = <sup>u</sup>∈<sup>Σ</sup> <sup>C</sup>(D(hu), i + 1) = <sup>C</sup>(D(h), i) as desired. If instead it is the adversary's turn after h, then W(X|h) = min<sup>u</sup>∈<sup>Σ</sup> W(X|hu) = min<sup>u</sup>∈<sup>Σ</sup> C(D(hu), i + 1) = C(D(h), i) again as desired. So by induction the hypothesis holds for any i.

**Fig. 6.** The hard specification DFA H in our running example, showing how W(I|h) is computed.

**Theorem 6.1.** RCI (DFA, DFA) *has a polynomial-time improvisation scheme.*

*Proof.* We implement Theorem 5.1. Intersection can be done with the standard product construction. For width measurement we compute the quantities C(v, i) by dynamic programming (from i = n down to i = 0) and apply Lemma 6.1.

#### **7 Temporal Logics and Other Specifications**

In this section we analyze the complexity of reactive control improvisation for specifications in the popular temporal logics LTL and LDL. We also look at NFA and CFG specifications, previously studied for non-reactive CI [8], to see how their complexities change in the reactive case.

For LTL specifications, reactive control improvisation is PSPACE-hard because this is already true of ordinary reactive synthesis in a finite window (we suspect this has been observed but could not find a proof in the literature).

**Theorem 7.1.** *Finite-window reactive synthesis for* LTL *is* PSPACE*-hard.*

*Proof (sketch).* Given a QBF φ = ∃x∀y...χ, we can view assignments to its variables as traces over a single proposition. In polynomial time we can construct an LTL formula ψ whose models are the satisfying assignments of χ. Then there is a winning strategy to generate a play satisfying ψ iff φ is true.

**Corollary 7.1.** RCI (LTL, Σ∗) *and* RCI (Σ∗, LTL) *are* PSPACE*-hard.*

This is perhaps disappointing, but is an inevitable consequence of LTL subsuming Boolean formulas. On the other hand, our general polynomial-space scheme applies to LTL and its much more expressive generalization LDL:

**Theorem 7.2.** RCI (LDL, LDL) *has a polynomial-space improvisation scheme.*

*Proof.* This follows from Theorem 5.2, since satisfaction of an LDL formula by a finite word can be checked in polynomial time (e.g. by combining dynamic programming on subformulas with a regular expression parser). Thus for temporal logics polynomial-time algorithms are unlikely, but adding randomization to reactive synthesis does not increase its complexity.

The same is true for NFA and CFG specifications, where it is again PSPACEhard to find even a single winning strategy:

**Theorem 7.3.** *Finite-window reactive synthesis for* NFA*s is* PSPACE*-hard.*

*Proof (sketch).* Reduce from QBF as in Theorem 7.1, constructing an NFA accepting the satisfying assignments of χ (as done in [13]).

**Corollary 7.2.** RCI (NFA, Σ∗) *and* RCI (Σ∗, NFA) *are* PSPACE*-hard.*

**Theorem 7.4.** RCI (CFG, CFG) *has a polynomial-space improvisation scheme.*

*Proof.* By Theorem 5.2, since CFG parsing can be done in polynomial time.

Since NFAs can be converted to CFGs in polynomial time, this completes the picture for the kinds of CI specifications previously studied. In non-reactive CI, DFA specifications admit a polynomial-time improvisation scheme while for NFAs/CFGs the CI problem is #P-equivalent [8]. Adding reactivity, DFA specifications remain polynomial-time while NFAs and CFGs move up to PSPACE.

**Table 1.** Complexity of the reactive control improvisation problem for various types of hard and soft specifications H, S. Here PSPACE indicates that checking realizability is PSPACE-hard, and that there is a polynomial-space improvisation scheme.


#### **8 Conclusion**

In this paper we introduced *reactive control improvisation* as a framework for modeling reactive synthesis problems where random but controlled behavior is desired. RCI provides a natural way to tune the amount of randomness while ensuring that safety or other constraints remain satisfied. We showed that RCI problems can be efficiently solved in many cases occurring in practice, giving a polynomial-time improvisation scheme for reachability/safety or DFA specifications. We also showed that RCI problems with specifications in LTL or LDL, popularly used in planning, have the PSPACE-hardness typical of bounded games, and gave a matching polynomial-space improvisation scheme. This scheme generalizes to any specification checkable in polynomial space, including NFAs, CFGs, and many more expressive formalisms. Table 1 summarizes these results.

These results show that, at a high level, finding a maximally-randomized strategy using RCI is no harder than finding any winning strategy at all: for specifications yielding games solvable in polynomial time (respectively, space), we gave polynomial-time (space) improvisation schemes. We therefore hope that in applications where ordinary reactive synthesis has proved tractable, our notion of randomized reactive synthesis will also. In particular, we expect our DFA scheme to be quite practical, and are experimenting with applications in robotic planning. On the other hand, our scheme for temporal logic specifications seems unlikely to be useful in practice without further refinement. An interesting direction for future work would be to see if modern solvers for quantified Boolean formulas (QBF) could be leveraged or extended to solve these RCI problems. This could be useful even for DFA specifications, as conjoining many simple properties can lead to exponentially-large automata. Symbolic methods based on constraint solvers would avoid such blow-up.

We are also interested in extending the RCI problem definition to unbounded or infinite words, as typically used in reactive synthesis. These extensions, as well as that to continuous signals, would be useful in robotic planning, cyberphysical system testing, and other applications. However, it is unclear how best to adapt our randomness constraint to settings where the improviser can generate infinitely many words. In such settings the improviser could assign arbitrarily small or even zero probability to every word, rendering the randomness constraint trivial. Even in the bounded case, RCI extensions with more complex randomness constraints than a simple upper bound on individual word probabilities would be worthy of study. One possibility would be to more directly control diversity and/or unpredictability by requiring the distribution of the improviser's output to be close to uniform after transformation by a given function.

**Acknowledgements.** The authors would like to thank Markus Rabe, Moshe Vardi, and several anonymous reviewers for helpful discussions and comments, and Ankush Desai and Tommaso Dreossi for assistance with the drone simulations. This work is supported in part by the National Science Foundation Graduate Research Fellowship Program under Grant No. DGE-1106400, by NSF grants CCF-1139138 and CNS-1646208, by DARPA under agreement number FA8750-16-C0043, and by TerraSwarm, one of six centers of STARnet, a Semiconductor Research Corporation program sponsored by MARCO and DARPA.

#### **References**


Languages. POPL 1989, pp. 179–190. ACM, New York (1989). http://doi.acm.org/ 10.1145/75277.75293


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

### **Constraint-Based Synthesis of Coupling Proofs**

Aws Albarghouthi<sup>1</sup> and Justin Hsu2,3(B)

 University of Wisconsin–Madison, Madison, WI, USA University College London, London, UK Cornell University, Ithaca, NY, USA email@justinh.su

**Abstract.** *Proof by coupling* is a classical technique for proving properties about pairs of randomized algorithms by carefully *relating* (or *coupling*) two probabilistic executions. In this paper, we show how to automatically construct such proofs for probabilistic programs. First, we present f-*coupled postconditions*, an abstraction describing two correlated program executions. Second, we show how properties of f-coupled postconditions can imply various probabilistic properties of the original programs. Third, we demonstrate how to reduce the proof-search problem to a purely logical *synthesis problem* of the form ∃f. ∀X. ϕ, making probabilistic reasoning unnecessary. We develop a prototype implementation to automatically build coupling proofs for probabilistic properties, including uniformity and independence of program expressions.

#### **1 Introduction**

In this paper, we aim to automatically synthesize *coupling proofs* for probabilistic programs and properties. Originally designed for proving properties comparing two probabilistic programs—so-called *relational properties*—a coupling proof describes how to correlate two executions of the given programs, simulating both programs with a single probabilistic program. By reasoning about this combined, *coupled* process, we can often give simpler proofs of probabilistic properties for the original pair of programs.

A number of recent works have leveraged this idea to verify relational properties of randomized algorithms, including differential privacy [8,10,12], security of cryptographic protocols [9], convergence of Markov chains [11], robustness of machine learning algorithms [7], and more. Recently, Barthe et al. [6] showed how to reduce certain *non-relational* properties—which describe a single probabilistic program—to relational properties of two programs, by duplicating the original program or by sequentially composing it with itself.

While coupling proofs can simplify reasoning about probabilistic properties, they are not so easy to use; most existing proofs are carried out manually in relational program logics using interactive theorem provers. In a nutshell, the

The full version of this paper is available at https://arxiv.org/abs/1804.04052.

c The Author(s) 2018

H. Chockler and G. Weissenbacher (Eds.): CAV 2018, LNCS 10981, pp. 327–346, 2018. https://doi.org/10.1007/978-3-319-96145-3\_18

main challenge in a coupling proof is to select a correlation for each pair of corresponding sampling instructions, aiming to induce a particular relation between the outputs of the coupled process; this relation then implies the desired relational property. Just like finding inductive invariants in proofs for deterministic programs, picking suitable couplings in proofs can require substantial ingenuity.

To ease this task, we recently showed how to cast the search for coupling proofs as a program synthesis problem [1], giving a way to automatically find sophisticated proofs of differential privacy previously beyond the reach of automated verification. In the present paper, we build on this idea and present a general technique for constructing coupling proofs, targeting *uniformity* and *probabilistic independence* properties. Both are fundamental properties in the analysis of randomized algorithms, either in their own right or as prerequisites to proving more sophisticated guarantees; uniformity states that a randomized expression takes on all values in a finite range with equal probability, while probabilistic independence states that two probabilistic expressions are somehow uncorrelated—learning the value of one reveals no additional information about the value of the other.

Our techniques are inspired by the automated proofs of differential privacy we considered previously [1], but the present setting raises new technical challenges.

**Non-lockstep execution.** To prove differential privacy, the behavior of a single program is compared on two related inputs. To take advantage of the identical program structure, previous work restricted attention to *synchronizing* proofs, where the two executions can be analyzed assuming they follow the same control flow path. In contrast, coupling proofs for uniformity and independence often require relating two programs with different shapes, possibly following completely different control flows [6].

To overcome this challenge, we take a different approach. Instead of incrementally finding couplings for corresponding pairs of sampling instructions requiring the executions to be tightly synchronized—we first lift all sampling instructions to the front of the program and pick a coupling once and for all. The remaining execution of both programs can then be encoded separately, with no need for lockstep synchronization (at least for loop-free programs looping programs require a more careful treatment).

**Richer space of couplings.** The heart of a coupling proof is selecting among multiple possible options—a particular correlation for each pair of random sampling instructions. Random sampling in differentially private programs typically use highly domain-specific distributions, like the Laplace distribution, which support a small number of useful couplings. Our prior work leveraged this feature to encode a collection of primitive couplings into the synthesis system. However, this is no longer possible when programs sample from distributions supporting richer couplings, like the uniform distribution. Since our approach coalesces all sampling instructions at the beginning of the program (more generally, at the head of the loop), we also need to find couplings for products of distributions.

We address this problem in two ways. First, we allow couplings of two sampling instructions to be specified by an injective function f from one range to another. Then, we impose requirements—encoded as standard logical constraints—to ensure that f indeed represents a coupling; we call such couplings f-*couplings*.

**More general class of properties.** Finally, we consider a broad class of properties rather than just differential privacy. While we focus on uniformity and independence for concreteness, our approach can establish general equalities between products of probabilities, i.e., probabilistic properties of the form

$$\prod\_{i=1}^{m} \Pr[e\_i \in E\_i] = \prod\_{j=1}^{n} \Pr[e'\_j \in E'\_j],$$

where <sup>e</sup><sup>i</sup> and <sup>e</sup> <sup>j</sup> are program expressions in the first and second programs respectively, and <sup>E</sup><sup>i</sup> and <sup>E</sup> <sup>j</sup> are predicates. As an example, we automatically establish a key step in the proof of Bertrand's Ballot theorem [20].

**Paper Outline.** After overviewing our technique on a motivating example (Sect. 2), we detail our main contributions.


We conclude by comparing our technique with related approaches (Sect. 7).

#### **2 Overview and Illustration**

#### **2.1 Introducing** *f***-Couplings**

**A Simple Example.** We begin by illustrating f-couplings over two identical Bernoulli distributions, denoted by the following *probability mass functions*: μ<sup>1</sup>(x) = μ<sup>2</sup>(x)=0.5 for all x <sup>∈</sup> <sup>B</sup> (where <sup>B</sup> <sup>=</sup> {*true*, *false*}). In other words, the distribution <sup>μ</sup><sup>i</sup> returns *true* with probability 0.5, and *false* with probability 0.5.

An <sup>f</sup>-*coupling* for <sup>μ</sup><sup>1</sup>, μ<sup>2</sup> is a function <sup>f</sup> : <sup>B</sup> <sup>→</sup> <sup>B</sup> from the domain of the first distribution (B) to the domain of the second (also <sup>B</sup>); f should be injective and satisfy the *monotonicity property*: μ<sup>1</sup>(x) <sup>≤</sup> <sup>μ</sup><sup>2</sup>(f(x)) for all <sup>x</sup> <sup>∈</sup> <sup>B</sup>. In other words, f relates each element x <sup>∈</sup> <sup>B</sup> with an element f(x) that has an equal or larger probability in <sup>μ</sup><sup>2</sup>. For example, consider the function <sup>f</sup><sup>¬</sup> defined as

$$f\_{\neg}(x) = \neg x.$$

This function relates *true* in <sup>μ</sup><sup>1</sup> with *false* in <sup>μ</sup><sup>2</sup>, and vice versa. Observe that μ<sup>1</sup>(x) μ<sup>2</sup>(f<sup>¬</sup>(x)) for all x <sup>∈</sup> <sup>B</sup>, satisfying the definition of an f<sup>¬</sup>-coupling. We write <sup>μ</sup><sup>1</sup> <sup>f</sup><sup>¬</sup> <sup>μ</sup><sup>2</sup> when there is an <sup>f</sup><sup>¬</sup>-coupling for <sup>μ</sup><sup>1</sup> and <sup>μ</sup><sup>2</sup>.

**Using** f**-Couplings.** An f-coupling can imply useful properties about the distributions <sup>μ</sup><sup>1</sup> and <sup>μ</sup><sup>2</sup>. For example, suppose we want to prove that <sup>μ</sup><sup>1</sup>(*true*) = <sup>μ</sup><sup>2</sup>(*false*). The fact that there is an <sup>f</sup><sup>¬</sup>-coupling of <sup>μ</sup><sup>1</sup> and <sup>μ</sup><sup>2</sup> immediately implies the equality: by the monotonicity property,

$$\begin{aligned} \mu\_1(true) &\leqslant \mu\_2(f\_{\neg}(true)) = \mu\_2(false) \\ \mu\_1(false) &\leqslant \mu\_2(f\_{\neg}(false)) = \mu\_2(true) \end{aligned}$$

and therefore <sup>μ</sup><sup>1</sup>(*true*) = μ<sup>2</sup>(*false*). More generally, it suffices to find an fcoupling of <sup>μ</sup><sup>1</sup> and <sup>μ</sup><sup>2</sup> such that

$$\underbrace{\{(x, f(x)) \mid x \in \mathbb{B}\}}\_{\Psi\_f} \subseteq \{(z\_1, z\_2) \mid z\_1 = true \iff z\_2 = false\},$$

where <sup>Ψ</sup><sup>f</sup> is induced by <sup>f</sup>; in particular, the <sup>f</sup><sup>¬</sup>-coupling satisfies this property.

#### **2.2 Simulating a Fair Coin**

Now, let's use f-couplings to prove more interesting properties. Consider the program fairCoin in Fig. 1; the program simulates a fair coin by flipping a possibly biased coin that returns *true* with probability p <sup>∈</sup> (0, 1), where p is a program parameter. Our goal is to prove that for any p, the output of the program is a uniform distribution—it simulates a fair coin. We consider two separate copies of fairCoin generating distributions <sup>μ</sup><sup>1</sup> and <sup>μ</sup><sup>2</sup> over the returned value

$$\begin{array}{c} \mathsf{fun} \,\,\mathsf{fairCoin}(p \in (0,1)) \\\ x \leftarrow false \\\ y \leftarrow false \\\ \mathtt{while} \,\, x = y \,\, \mathsf{do} \\\ x \sim \mathsf{bern}(p) \\\ y \sim \mathsf{bern}(p) \\\ \mathtt{return} \,\, x \end{array}$$

**Fig. 1.** Simulating a fair coin using an unfair one

x for the same bias p, and we construct a coupling showing μ<sup>1</sup>(*true*) = <sup>μ</sup><sup>2</sup>(*false*), that is, heads and tails have equal probability.

**Constructing** f**-Couplings.** At first glance, it is unclear how to construct an f-coupling; unlike the distributions in our simple example, we do not have a concrete description of <sup>μ</sup><sup>1</sup> and <sup>μ</sup><sup>2</sup> as uniform distributions (indeed, this is what we are trying to establish). The key insight is that we do not need to construct our coupling in one shot. Instead, we can specify a coupling for the concrete, primitive sampling instructions in the body of the loop—which we know sample from bern(p)—and then extend to a f-coupling for the whole loop and μ<sup>1</sup>, μ<sup>2</sup>.

For each copy of fairCoin, we coalesce the two sampling statements inside the loop into a single sampling statement from the product distribution:

$$x, y \sim \mathsf{bern}(p) \times \mathsf{bern}(p)$$

We have two such joint distributions bern(p) <sup>×</sup> bern(p) to couple, one from each copy of fairCoin. We use the following function <sup>f</sup>*swap* : <sup>B</sup><sup>2</sup> <sup>→</sup> <sup>B</sup><sup>2</sup>:

$$f\_{swap}(x, y) = (y, x)$$

which exchanges the values of x and y. Since this is an injective function satisfying the monotonicity property

$$(\mathsf{bern}(p) \times \mathsf{bern}(p))(x, y) \lessapprox (\mathsf{bern}(p) \times \mathsf{bern}(p))(f\_{swap}(x, y)),$$

for all (x, y) <sup>∈</sup> <sup>B</sup>×<sup>B</sup> and <sup>p</sup> <sup>∈</sup> (0, 1), we have an <sup>f</sup>*swap*-coupling for the two copies of bern(p) <sup>×</sup> bern(p).

**Analyzing the Loop.** To extend a <sup>f</sup>body-coupling on loop bodies to the entire loop, it suffices to check a synchronization condition: the coupling from <sup>f</sup>body must ensure that the loop guards are equal so the two executions synchronize at the loop head. This holds in our case: every time the first program executes the statement x, y <sup>∼</sup> bern(p) <sup>×</sup> bern(p), we can think of x, y as non-deterministically set to some values (a, b), and the corresponding variables in the second program as set to f*swap*(a, b)=(b, a). The loop guards in the two programs are equivalent under this choice, since a <sup>=</sup> b is equivalent to b <sup>=</sup> a, hence we can analyze the loops in lockstep. In general, couplings enable us to relate samples from a pair of probabilistic assignments as if they were selected non-deterministically, often avoiding quantitative reasoning about probabilities.

Our constructed coupling for the loop guarantees that (*i*) both programs exit the loop at the same time, and (*ii*) when the two programs exit the loop, x takes opposite values in the two programs. In other words, there is an f*loop*-coupling of <sup>μ</sup><sup>1</sup> and <sup>μ</sup><sup>2</sup> for some function <sup>f</sup>*loop* such that

$$\Psi\_{f\_{loop}} \subseteq \{(z\_1, z\_2) \mid z\_1 = true \iff z\_2 = false\},\tag{1}$$

implying <sup>μ</sup><sup>1</sup>(*true*) = μ<sup>2</sup>(*false*). Since both distributions are output distributions of fairCoin—hence <sup>μ</sup><sup>1</sup> <sup>=</sup> <sup>μ</sup><sup>2</sup>—we conclude that fairCoin simulates a fair coin.

Note that our approach does not need to construct <sup>f</sup>loop concretely—this function may be highly complex. Instead, we only need to show that <sup>Ψ</sup><sup>f</sup>*loop* (or some over-approximation) lies inside the target relation in Formula 1.

**Achieving Automation.** Observe that once we have fixed an <sup>f</sup>*body* -coupling for the sampling instructions inside the loop body, checking that the f*loop*-coupling satisfies the conditions for uniformity (Formula 1) is essentially a program verification problem. Therefore, we can cast the problem of constructing a coupling proof as a logical problem of the form <sup>∃</sup>f. <sup>∀</sup>X. ϕ, where f is the f-coupling we need to discover and <sup>∀</sup>X. ϕ is a constraint ensuring that (*i*) f indeed represents an f-coupling, and (*ii*) the f-coupling implies uniformity. Thus, we can use established synthesis-verification techniques to solve the resulting constraints (see, e.g., [2,13,27]).

#### **3 A Proof Rule for Coupling Proofs**

In this section, we develop a technique for constructing couplings and formalize proof rules for establishing uniformity and independence properties over program variables. We begin with background on probability distributions and couplings.

#### **3.1 Distributions and Couplings**

**Distributions.** A function μ : B <sup>→</sup> [0, 1] defines a *distribution* over a countable set B if <sup>b</sup>∈<sup>B</sup> <sup>μ</sup>(b) = 1. We will often write <sup>μ</sup>(A) for a subset <sup>A</sup> <sup>⊆</sup> <sup>B</sup> to mean <sup>x</sup>∈<sup>A</sup> <sup>μ</sup>(x). We write *dist*(B) for the set of all distributions over <sup>B</sup>.

We will need a few standard constructions on distributions. First, the *support* of a distribution μ is defined as *supp*(μ) = {b <sup>∈</sup> B <sup>|</sup> μ(b) > <sup>0</sup>}. Second, for a distribution on pairs <sup>μ</sup> <sup>∈</sup> *dist*(B<sup>1</sup> <sup>×</sup> <sup>B</sup><sup>2</sup>), the first and second *marginals* of <sup>μ</sup>, denoted <sup>π</sup><sup>1</sup>(μ) and <sup>π</sup><sup>2</sup>(μ) respectively, are distributions over <sup>B</sup><sup>1</sup> and <sup>B</sup><sup>2</sup>:

$$
\pi\_1(\mu)(b\_1) \triangleq \sum\_{b\_2 \in B\_2} \mu(b\_1, b\_2) \tag{2.10}
\\
\qquad \qquad \qquad \pi\_2(\mu)(b\_2) \triangleq \sum\_{b\_1 \in B\_1} \mu(b\_1, b\_2).
$$

**Couplings.** Let <sup>Ψ</sup> <sup>⊆</sup> <sup>B</sup><sup>1</sup>×B<sup>2</sup> be a binary relation. A <sup>Ψ</sup>-*coupling* for distributions <sup>μ</sup><sup>1</sup> and <sup>μ</sup><sup>2</sup> over <sup>B</sup><sup>1</sup> and <sup>B</sup><sup>2</sup> is a distribution <sup>μ</sup> <sup>∈</sup> *dist*(B<sup>1</sup>×B<sup>2</sup>) with (*i*) <sup>π</sup><sup>1</sup>(μ) = <sup>μ</sup><sup>1</sup> and <sup>π</sup><sup>2</sup>(μ) = <sup>μ</sup><sup>2</sup>; and (*ii*) *supp*(μ) <sup>⊆</sup> <sup>Ψ</sup>. We write <sup>μ</sup><sup>1</sup> <sup>Ψ</sup> <sup>μ</sup><sup>2</sup> when there exists a <sup>Ψ</sup>-coupling between <sup>μ</sup><sup>1</sup> and <sup>μ</sup><sup>2</sup>.

An important fact is that an injective function <sup>f</sup> : <sup>B</sup><sup>1</sup> <sup>→</sup> <sup>B</sup><sup>2</sup> where <sup>μ</sup><sup>1</sup>(b) - <sup>μ</sup><sup>2</sup>(f(b)) for all <sup>b</sup> <sup>∈</sup> <sup>B</sup><sup>1</sup> induces a coupling between <sup>μ</sup><sup>1</sup> and <sup>μ</sup><sup>2</sup>; this follows from a general theorem by Strassen [28], see also [23]. We write <sup>μ</sup><sup>1</sup> <sup>f</sup> <sup>μ</sup><sup>2</sup> for <sup>μ</sup><sup>1</sup> <sup>Ψ</sup>*<sup>f</sup>* <sup>μ</sup><sup>2</sup>, where <sup>Ψ</sup><sup>f</sup> <sup>=</sup> {(b<sup>1</sup>, f(b<sup>1</sup>)) <sup>|</sup> <sup>b</sup><sup>1</sup> <sup>∈</sup> <sup>B</sup><sup>1</sup>}. The existence of a coupling can imply various useful properties about the two distributions. The following general fact will be the most important for our purposes—couplings can prove equalities between probabilities.

**Proposition 1.** *Let* <sup>E</sup><sup>1</sup> <sup>⊆</sup> <sup>B</sup><sup>1</sup> *and* <sup>E</sup><sup>2</sup> <sup>⊆</sup> <sup>B</sup><sup>2</sup> *be two events, and let* <sup>Ψ</sup><sup>=</sup> {(b<sup>1</sup>, b<sup>2</sup>) <sup>|</sup> <sup>b</sup><sup>1</sup> <sup>∈</sup> <sup>E</sup><sup>1</sup> ⇐⇒ <sup>b</sup><sup>2</sup> <sup>∈</sup> <sup>E</sup><sup>2</sup>}*. If* <sup>μ</sup><sup>1</sup> <sup>Ψ</sup><sup>=</sup> <sup>μ</sup><sup>2</sup>*, then* <sup>μ</sup><sup>1</sup>(E<sup>1</sup>) = <sup>μ</sup><sup>2</sup>(E<sup>2</sup>)*.*

#### **3.2 Program Model**

Our program model uses an imperative language with probabilistic assignments, where we can draw a random value from primitive distributions. We consider the easier case of loop-free programs first; we consider looping programs in Sect. 5.

**Syntax.** A (loop-free) program P is defined using the following grammar:


where V is the set of variables that can appear in P, *exp* is an expression over V , and *bexp* is a Boolean expression over V . A probabilistic assignment samples from a probability distribution defined by expression *dexp*; for instance, *dexp* might be bern(p), the Bernoulli distribution with probability p of returning *true*. We use <sup>V</sup> <sup>I</sup> <sup>⊆</sup> V to denote the set of input program variables, which are never assigned to. All other variables are assumed to be defined before use.

We make a few simplifying assumptions. First, distribution expressions only mention input variables V <sup>I</sup> , e.g., in the example above, bern(p), we have <sup>p</sup> <sup>∈</sup> <sup>V</sup> <sup>I</sup> . Also, all programs are in *static single assignment* (ssa) form, where each variable is assigned to only once and are well-typed. These assumptions are relatively minor; they can can be verified using existing tools, or lifted entirely at the cost of slightly more complexity in our encoding.

**Semantics.** A state s of a program P is a valuation of all of its variables, represented as a map from variables to values, e.g., s(x) is the value of x <sup>∈</sup> V in s. We extend this mapping to expressions: s(*exp*) is the valuation of *exp* in s, and s(*dexp*) is the probability distribution defined by *dexp* in s.

We use S to denote the set of all possible program states. As is standard [24], we can give a semantics of P as a function -P : S <sup>→</sup> *dist*(S) from states to distributions over states. For an output distribution μ <sup>=</sup> -P(s), we will abuse notation and write, e.g., μ(x <sup>=</sup> y) to denote the probability of the event that the program returns a state s where s(x <sup>=</sup> y) = *true*.

**Self-Composition.** We will sometimes need to simulate two separate executions of a program with a single probabilistic program. Given a program P, we use <sup>P</sup><sup>i</sup> to denote a program identical to <sup>P</sup> but with all variables *tagged* with the subscript i. We can then define the *self-composition*: given a program P, the program <sup>P</sup><sup>1</sup>; <sup>P</sup><sup>2</sup> first executes <sup>P</sup><sup>1</sup>, and then executes the (separate) copy <sup>P</sup><sup>2</sup>.

#### **3.3 Coupled Postconditions**

We are now ready to present the f-*coupled postcondition*, an operator for approximating the outputs of two coupled programs.

**Strongest Postcondition.** We begin by defining a standard strongest postcondition operator over single programs, treating probabilistic assignments as no-ops. Given a set of states Q <sup>⊆</sup> S, we define post as follows:

$$\begin{aligned} \mathsf{post}(v \leftarrow \exp, Q) &= \{s[v \mapsto s(\exp)] \mid s \in Q\} \\ \mathsf{post}(v \sim \mathsf{dexp}, Q) &= Q \\ \mathsf{post}(\mathsf{if} \quad bexp \quad \mathsf{then} \quad P \quad \mathsf{else} \quad P', \ Q) &= \{s' \mid s \in Q, s' \in \mathsf{post}(P, s), s(bexp) = true\} \\ &\cup \{s' \mid s \in Q, s' \in \mathsf{post}(P', s), s(bexp) = false\} \\ \mathsf{post}(P; P', Q) &= \mathsf{post}(P', \mathsf{post}(P, Q)) \end{aligned}$$

where s[v <sup>→</sup> c] is state s with variable v mapped to the value c.

f**-Coupled Postcondition.** We rewrite programs so that all probabilistic assignments are combined into a single probabilistic assignment to a vector of variables appearing at the beginning of the program, i.e., an assignment of the form *<sup>v</sup>* <sup>∼</sup> *dexp* in P and *<sup>v</sup>* <sup>∼</sup> *dexp* in <sup>P</sup> , where *<sup>v</sup>*, *<sup>v</sup>* are vectors of variables. For instance, we can combine x <sup>∼</sup> bern(0.5); y <sup>∼</sup> bern(0.5) into the single statement x, y <sup>∼</sup> bern(0.5) <sup>×</sup> bern(0.5).

Let B,B be the domains of *<sup>v</sup>* and *<sup>v</sup>* , f : B <sup>→</sup> B be a function, and Q <sup>⊆</sup> S <sup>×</sup> S be a set of pairs of input states, where <sup>S</sup> and <sup>S</sup> are the states of <sup>P</sup> and P , respectively. We define the f-coupled postcondition operator cpost as

$$\mathsf{cpost}(P, P', Q, f) = \{ (\mathsf{post}(P, s), \mathsf{post}(P', s')) \mid (s, s') \in Q' \}$$

$$\text{where } Q' = \{ (s[\mathsf{v} \mapsto \mathsf{b}], s'[\mathsf{v}' \mapsto f(\mathsf{b})]) \mid (s, s') \in Q, \mathsf{b} \in B \},$$

$$\text{assuming that } \quad \forall (s, s') \in Q. \, s(\mathsf{dexp}) \xleftarrow{f} s'(\mathsf{dexp'}).\tag{2}$$

The intuition is that the values drawn from sampling assignments in both programs are coupled using the function f. Note that this operation nondeterministically assigns *<sup>v</sup>* from P with some values *<sup>b</sup>*, and *<sup>v</sup>* with <sup>f</sup>(*b*). Then, the operation simulates the executions of the two programs. Formula 2 states that there is an f-coupling for every instantiation of the two distributions used in probabilistic assignments in both programs.

*Example 1.* Consider the simple program P defined as x <sup>∼</sup> bern(0.5); x <sup>=</sup> <sup>¬</sup>x and let f<sup>¬</sup>(x) = <sup>¬</sup>x. Then, cpost(P, P, Q, f<sup>¬</sup>) is {(s, s ) <sup>|</sup> s(x) = <sup>¬</sup>s (x)}.

The main soundness theorem shows there is a probabilistic coupling of the output distributions with support contained in the coupled postcondition (we defer all proofs to the full version of this paper).

**Theorem 1.** *Let programs* <sup>P</sup> *and* <sup>P</sup> *be of the form <sup>v</sup>* <sup>∼</sup> *dexp*; <sup>P</sup><sup>D</sup> *and <sup>v</sup>* <sup>∼</sup> *dexp* ; P <sup>D</sup>*, for deterministic programs* P<sup>D</sup>, P <sup>D</sup>*. Given a function* f : B <sup>→</sup> B *satisfying Formula 2, for every* (s, s ) <sup>∈</sup> S <sup>×</sup> S *we have* -P(s) <sup>Ψ</sup> -P (s )*, where* Ψ <sup>=</sup> cpost(P, P ,(s, s ), f)*.*

#### **3.4 Proof Rules for Uniformity and Independence**

We are now ready to demonstrate how to establish uniformity and independence of program variables using f-coupled postconditions. We will continue to assume that random sampling commands have been lifted to the front of each program, and that f satisfies Formula 2.

**Uniformity.** Consider a program <sup>P</sup> and a variable <sup>v</sup><sup>∗</sup> <sup>∈</sup> V of finite, non-empty domain B. Let μ <sup>=</sup> -P(s) for some state s <sup>∈</sup> S. We say that variable v<sup>∗</sup> is *uniformly distributed* in μ if μ(v<sup>∗</sup> <sup>=</sup> <sup>b</sup>) = <sup>1</sup> <sup>|</sup>B<sup>|</sup> for every <sup>b</sup> <sup>∈</sup> <sup>B</sup>.

The following theorem connects uniformity with f-coupled postconditions.

**Theorem 2 (Uniformity).** *Consider a program* P *with <sup>v</sup>* <sup>∼</sup> *dexp as its first statement and a designated return variable* v<sup>∗</sup> <sup>∈</sup> <sup>V</sup> *with domain* <sup>B</sup>*. Let* <sup>Q</sup> <sup>=</sup> {(s, s) <sup>|</sup> s <sup>∈</sup> S} *be the input relation. If we have*

$$\exists f. \mathsf{cpost}(P, P, Q, f) \subseteq \{(s, s') \in S \times S \mid s(v^\*) = b \iff s'(v^\*) = b'\}$$

*for all* b, b <sup>∈</sup> <sup>B</sup>*, then for any input* <sup>s</sup> <sup>∈</sup> <sup>S</sup> *the final value of* <sup>v</sup><sup>∗</sup> *is uniformly distributed over* B *in* -P(s)*.*

The intuition is that in the two <sup>f</sup>-coupled copies of <sup>P</sup>, the first <sup>v</sup><sup>∗</sup> is equal to b exactly when the second v<sup>∗</sup> is equal to <sup>b</sup> . Hence, the probability of returning b in the first copy and b in the second copy are the same. Repeating for every pair of values b, b , we conclude that v<sup>∗</sup> is uniformly distributed.

*Example 2.* Recall Example <sup>1</sup> and let <sup>b</sup> <sup>=</sup> *true* and <sup>b</sup> <sup>=</sup> *false*. We have

$$\mathsf{cpost}(P, P, Q, f\_{\neg}) \subseteq \{(s, s') \in S \times S \mid s(x) = b \iff s'(x) = b'\}.$$

This is sufficient to prove uniformity (the case with b <sup>=</sup> b is trivial).

**Independence.** We now present a proof rule for independence. Consider a program P and two variables v<sup>∗</sup>, w<sup>∗</sup> <sup>∈</sup> <sup>V</sup> with domains <sup>B</sup> and <sup>B</sup> , respectively. Let μ <sup>=</sup> -P(s) for some state s <sup>∈</sup> S. We say that v<sup>∗</sup>, w<sup>∗</sup> are *probabilistically independent* in μ if μ(v<sup>∗</sup> <sup>=</sup> b <sup>∧</sup> w<sup>∗</sup> <sup>=</sup> b ) = μ(v<sup>∗</sup> <sup>=</sup> b)· μ(w<sup>∗</sup> <sup>=</sup> b ) for every b <sup>∈</sup> B and b <sup>∈</sup> B .

The following theorem connects independence with f-coupled postconditions. We will self-compose two tagged copies of <sup>P</sup>, called <sup>P</sup><sup>1</sup> and <sup>P</sup><sup>2</sup>.

**Theorem 3 (Independence).** *Assume a program* P *and define the relation*

$$Q = \{ (s, s\_1 \oplus s\_2) \mid s \in S, s\_i \in S\_i, s(v) = s\_i(v\_i), \text{ for all } v \in V^I \},$$

*where* <sup>⊕</sup> *takes the union of two maps with disjoint domains. Fix some* w<sup>∗</sup>, v<sup>∗</sup> <sup>∈</sup> <sup>V</sup> *with domains* B,B *, and assume that for all* b <sup>∈</sup> B*,* b <sup>∈</sup> B *, there exists a function* f *such that* cpost(P,(P<sup>1</sup>; <sup>P</sup><sup>2</sup>), Q, f) *is contained in*

$$s\left\{ (s', s\_1' \oplus s\_2') \mid s'(v^\*) = b \land s'(w^\*) = b' \iff s\_1'(v\_1^\*) = b \land s\_2'(w\_2^\*) = b' \right\}.$$

*Then,* <sup>w</sup><sup>∗</sup>, v<sup>∗</sup> *are independently distributed in* -P(s) *for all inputs* s <sup>∈</sup> S*.*

The idea is that under the coupling, the probability of <sup>P</sup> returning <sup>v</sup><sup>∗</sup> <sup>=</sup> <sup>b</sup>∧w<sup>∗</sup> <sup>=</sup> <sup>b</sup> is the same as the probability of <sup>P</sup><sup>1</sup> returning <sup>v</sup><sup>∗</sup> <sup>=</sup> <sup>b</sup> and <sup>P</sup><sup>2</sup> returning w<sup>∗</sup> <sup>=</sup> b , for all values b, b . Since <sup>P</sup><sup>1</sup> and <sup>P</sup><sup>2</sup> are two independent executions of P by construction, this establishes independence of v<sup>∗</sup> and <sup>w</sup><sup>∗</sup>.

#### **4 Constraint-Based Formulation of Proof Rules**

In Sect. 3, we formalized the problem of constructing a coupling proof using fcoupled postconditions. We now automatically find such proofs by posing the problem as a constraint, where a solution gives a function f establishing our desired property.

#### **4.1 Generating Logical and Probabilistic Constraints**

**Logical Encoding.** We first encode program executions as formulas in firstorder logic, using the following encoding function:

$$\mathsf{enc}(v \leftarrow \exp) \triangleq v = \exp$$

$$\mathsf{enc}(v \sim \mathsf{dex}p) \triangleq \mathsf{true}$$

$$\mathsf{enc}(\mathsf{if} \quad bexp \quad \mathsf{then} \quad P \quad \mathsf{el} \mathsf{1se} \quad P') \triangleq (b \exp \Rightarrow \mathsf{enc}(P)) \land (\neg b \exp \Rightarrow \mathsf{enc}(P'))$$

$$\mathsf{enc}(P; P') \triangleq \mathsf{enc}(P) \land \mathsf{enc}(P')$$

We assume a direct correspondence between expressions in our language and the first-order theory used for our encoding, e.g., linear arithmetic. Note that the encoding disregards probabilistic assignments, encoding them as *true*; this mimics the semantics of our strongest postcondition operator post. Probabilistic assignments will be handled via a separate encoding of f-couplings.

As expected, enc reflects the strongest postcondition post.

**Lemma 1.** *Let* P *be a program and let* ρ *be any assignment of the variables. An assignment* ρ *agreeing with* <sup>ρ</sup> *on all input variables* <sup>V</sup> <sup>I</sup> *satisfies the constraint* enc(P)[ρ /V ] *precisely when* post(P, {ρ}) = {ρ }*, treating* ρ, ρ *as program states.*

**Uniformity Constraints.** We can encode the conditions in Theorem 2 for showing uniformity as a logical constraint. For a program P and a copy P<sup>1</sup>, with first statements *v* ∼ *dexp* and *v*<sup>1</sup> ∼ *dexp*1, we define the constraints:

$$\begin{aligned} \forall a, a' &\exists f. \forall V, V\_1. \\ (V^I = V\_1^I \land v\_1 = f(v) \land \mathsf{enc}(P) \land \mathsf{enc}(P\_1)) \end{aligned} \tag{3}$$

$$\implies \left(v^\* = a \iff v\_1^\* = a'\right)$$

$$V^I = V\_1^I \implies \operatorname{dexp} \leadsto^f \operatorname{dexp\_1} \tag{4}$$

Note that this is a second-order formula, as it quantifies over the *uninterpreted function* f. The left side of the implication in Formula 3 encodes an f-coupled execution of <sup>P</sup> and <sup>P</sup><sup>1</sup>, starting from equal initial states. The right side of this implication encodes the conditions for uniformity, as in Theorem 2.

Formula 4 ensures that there is an <sup>f</sup>-coupling between *dexp* and *dexp*<sup>1</sup> for any initial state; recall that *dexp* may mention input variables V <sup>I</sup> . The constraint *dexp* <sup>f</sup> *dexp*<sup>1</sup> is not a standard logical constraint—intuitively, it is satisfied if *dexp* <sup>f</sup> *dexp*<sup>1</sup> holds for some interpretation of <sup>f</sup>, *dexp*, and *dexp*1.

*Example 3.* The constraint

$$\exists f. \forall p, p'. p = p' \Rightarrow \mathsf{bern}(p) \xleftarrow{} \lnot \mathsf{bern}(p')$$

holds by setting <sup>f</sup> to the identity function id, since for any <sup>p</sup> <sup>=</sup> <sup>p</sup> we have an f-coupling bern(p) id bern(p ).

*Example 4.* Consider the program x <sup>∼</sup> bern(0.5); y <sup>=</sup> <sup>¬</sup>x. The constraints for uniformity of y are

$$\forall a, a'. \exists f. \forall V, V\_1. (x\_1 = f(x) \land y = \neg x \land y\_1 = \neg x\_1) \Longrightarrow (y = a \iff y\_1 = a'),$$

$$\mathsf{perm}(0.5) \leadsto^f \mathsf{perm}(0.5).$$

Since there are no input variables, <sup>V</sup> <sup>I</sup> <sup>=</sup> V <sup>I</sup> <sup>1</sup> is equivalent to *true*.

**Theorem 4 (Uniformity constraints).** *Fix a program* <sup>P</sup> *and variable* <sup>v</sup><sup>∗</sup> <sup>∈</sup> V *. Let* ϕ *be the uniformity constraints in Formulas 3 and 4. If* ϕ *is valid, then* v<sup>∗</sup> *is uniformly distributed in* -P(s) *for all* s <sup>∈</sup> S*.*

**Independence Constraints.** Similarly, we can characterize independence constraints using the conditions in Theorem 3. After transforming the program <sup>P</sup><sup>1</sup>; <sup>P</sup><sup>2</sup> to start with the single probabilistic assignment statement *<sup>v</sup>*1,<sup>2</sup> <sup>∼</sup> *dexp*1,2, combining probabilistic assignments in <sup>P</sup><sup>1</sup> and <sup>P</sup><sup>2</sup>, we define the constraints:

<sup>∀</sup>a, a .∃f. <sup>∀</sup>V,V1, V2. (V <sup>I</sup> <sup>=</sup> <sup>V</sup> <sup>I</sup> <sup>1</sup> <sup>=</sup> <sup>V</sup> <sup>I</sup> <sup>2</sup> <sup>∧</sup> *<sup>v</sup>*1,<sup>2</sup> <sup>=</sup> <sup>f</sup>(*v*) <sup>∧</sup> enc(P) <sup>∧</sup> enc(P<sup>1</sup>; <sup>P</sup><sup>2</sup>)) <sup>=</sup><sup>⇒</sup> (v<sup>∗</sup> <sup>=</sup> a <sup>∧</sup> w<sup>∗</sup> <sup>=</sup> a ⇐⇒ v<sup>∗</sup> <sup>1</sup> <sup>=</sup> <sup>a</sup> <sup>∧</sup> <sup>w</sup><sup>∗</sup> <sup>2</sup> <sup>=</sup> <sup>a</sup> ) (5)

$$V^I = V^I\_1 = V^I\_2 \implies \operatorname{dexp} \leadsto^f \operatorname{dexp}\_{1,2} \tag{6}$$

**Theorem 5 (Independence constraints).** *Fix a program* P *and two variables* v<sup>∗</sup>, w<sup>∗</sup> <sup>∈</sup> <sup>V</sup> *. Let* <sup>ϕ</sup> *be the independence constraints from Formulas 5 and 6. If* ϕ *is valid, then* v<sup>∗</sup>, w<sup>∗</sup> *are independent in* -P(s) *for all* s <sup>∈</sup> S*.*

#### **4.2 Constraint Transformation**

To solve our constraints, we transform our constraints into the form <sup>∃</sup>f. <sup>∀</sup>X. ϕ, where ϕ is a first-order formula. Such formulas can be viewed as *synthesis problems*, and are often solvable automatically using standard techniques.

We perform our transformation in two steps. First, we transform our constraint into the form <sup>∃</sup>f. <sup>∀</sup>X. ϕ<sup>p</sup>, where <sup>ϕ</sup><sup>p</sup> still contains the coupling constraint. Then, we replace the coupling constraint with a first-order formula by logically encoding primitive distributions as uninterpreted functions.

**Quantifier Reordering.** Our constraints are of the form <sup>∀</sup>a, a . <sup>∃</sup>f. <sup>∀</sup>X. ϕ. Intuitively, this means that for *every* possible value of a, a , we want *one* function f satisfying <sup>∀</sup>X. ϕ. We can pull the existential quantifier <sup>∃</sup>f to the outermost level by extending the function with additional parameters for a, a , thus defining a different function for every interpretation of a, a . For the uniformity constraints this transformation yields the following formulas:

<sup>∃</sup>g. <sup>∀</sup>a, a . <sup>∀</sup>V,V<sup>1</sup>. (V <sup>I</sup> <sup>=</sup> <sup>V</sup> <sup>I</sup> <sup>1</sup> <sup>∧</sup> *<sup>v</sup>*<sup>1</sup> <sup>=</sup> <sup>g</sup>(a, a , *<sup>v</sup>*) <sup>∧</sup> enc(P) <sup>∧</sup> enc(P<sup>1</sup>)) <sup>=</sup><sup>⇒</sup> (v<sup>∗</sup> <sup>=</sup> a ⇐⇒ v<sup>∗</sup> <sup>1</sup> <sup>=</sup> <sup>a</sup> ) (7)

$$V^I = V^I\_1 \implies \operatorname{dexp} \leadsto \dots \newline ^{g(a,a',-)} \operatorname{dexp}\_1 \tag{8}$$

where g(a, a , <sup>−</sup>) is the function after partially applying g.

**Transforming Coupling Constraints.** Our next step is to eliminate coupling constraints. To do so, we use the definition of f-coupling, which states that <sup>μ</sup><sup>1</sup> <sup>f</sup> <sup>μ</sup><sup>2</sup> if (*i*) <sup>f</sup> is injective and (*ii*) <sup>∀</sup>x. μ<sup>1</sup>(x) μ<sup>2</sup>(f(x)). The first constraint (injectivity) is straightforward. For the second point (monotonicity), we can encode distribution expressions—which represent functions to reals—as uninterpreted functions, which we then further constrain. For instance, the coupling constraint bern(p) <sup>f</sup> bern(p ) can be encoded as

$$\begin{aligned} \forall x, y. & x \neq y \Rightarrow f(x) \neq f(y) & \text{(injective)}\\ \forall x. & h(x) \leqslant h'(f(x)) & \text{(monotonicity)}\\ \forall x. & te(x = true, h(x) = p, h(x) = 1 - p) & \text{(bern(p) encoding)}\\ \forall x. & te(x = true, h'(x) = p', h'(x) = 1 - p') & \text{(bern(p') encoding)}\\ \dots & \dots & \dots & \dots \end{aligned}$$

where h, h : <sup>B</sup> <sup>→</sup> <sup>R</sup>≥<sup>0</sup> are uninterpreted functions representing the probability mass functions of bern(p) and bern(p ); note that the third constraint encodes the distribution bern(p), which returns *true* with probability p and false with probability 1 <sup>−</sup> p, and the fourth constraint encodes bern(p ).

Note that if we cannot encode the definition of the distribution in our firstorder theory (e.g., if it requires non-linear constraints), or if we do not have a concrete description of the distribution, we can simply elide the last two constraints and under-constrain h and h . In Sect. 6 we use this feature to prove properties of a program encoding a Bayesian network, where the primitive distributions are unknown program parameters.

**Theorem 6 (Transformation soundness).** *Let* ϕ *be the constraints generated for some program* <sup>P</sup>*. Let* <sup>ϕ</sup> *be the result of applying the above transformations to* ϕ*. If* ϕ *is valid, then* <sup>ϕ</sup> *is valid.*

**Constraint Solving.** After performing these transformations, we finally arrive at constraints of the form <sup>∃</sup>g. <sup>∀</sup>a, a . <sup>∀</sup>V.ϕ, where ϕ is a first-order formula. These exactly match constraint-based program synthesis problems. In Sect. 6, we use smt solvers and enumerative synthesis to handle these constraints.

#### **5 Dealing with Loops**

So far, we have only considered loop-free programs. In this section, we our approach to programs with loops.

## f**-Coupled Postconditions and Loops.** We consider programs of the form

# while *bexp* P*<sup>b</sup>*

where P*<sup>b</sup>* is a loop-free program that begins with the statement *<sup>v</sup>* <sup>∼</sup> *dexp*; our technique can also be extended to handle nested loops. We assume all programs terminate with probability 1 for any initial state; there are numerous systems for verifying this basic property automatically (see, e.g., [15–17]). To extend our fcoupled postconditions, we let cpost(P, P , Q, f) be the smallest set I satisfying:

$$\begin{aligned} Q &\subseteq I & \text{(initiation)}\\ \mathsf{cpost}(P^b, P^{b'}, I\_{en}, f) &\subseteq I & \text{(consection)}\\ I &\subseteq \{s(bexp) = s'(bexp') \mid s \in S, s' \in S'\} & \text{(synchrotronization)}\end{aligned}$$

where <sup>I</sup>*en* {(s, s ) <sup>∈</sup> I <sup>|</sup> s(*bexp*) = *true*}.

Intuitively, the set I is the least inductive invariant for the two coupled programs running with synchronized loops. Theorem 1, which establishes that f-coupled postconditions result in couplings over output distributions, naturally extends to a setting with loops.

**Constraint Generation.** To prove uniformity, we generate constraints much like the loop-free case except that we capture the invariant I, modeled as a relation over the variables of both programs, using a *Constrained Horn-Clause* (chc) encoding. As is standard, we use V , V <sup>1</sup> to denote primed copies of program variables denoting their value after executing the body, and we assume that enc(P<sup>b</sup>) encodes a loop-free program as a transition relation from states over V to states over V .

$$\begin{aligned} \forall a, a' \exists f, I \; \forall V, V\_1, V', V'\_1. \\ V^I = V^I\_1 \implies I(V, V\_1) \end{aligned} \tag{\text{initiation}}$$

$$\begin{aligned} I(V, V\_1) \land bexp \land v'\_1 &= f(v') \land \mathsf{enc}(P^b) \land \mathsf{enc}(P^b\_1) \implies I(V', V'\_1) \end{aligned} \implies I(V', V'\_1) \qquad \begin{aligned} I(\text{initiation}) \\ I(V, V\_1) \implies I(V', V'\_1) \end{aligned}$$

$$\begin{aligned} I(V, V\_1) \implies &bexp \xrightarrow{\sim} f' \text{ } \mathsf{decp}\_1 \\ I(V, V\_1) \land \neg bexp \Longrightarrow (v'' = a \iff v''\_1 = a') \end{aligned} \tag{\text{conditionally}}$$

The first three constraints encode the definition of cpost; the last two ensure that f constructs a coupling and that the invariant implies the uniformity condition when the loop terminates. Using the technique presented in Sect. 4.2, we can transform these constraints into the form <sup>∃</sup>f, I. <sup>∀</sup>X. ϕ. That is, in addition to discovering the function f, we need to discover the invariant I.

Proving independence in looping programs poses additional challenges, as directly applying the self-composition construction from Sect. 3 requires relating a single loop with two loops. When the number of loop iterations is deterministic, however, we may simulate two sequentially composed loops with a single loop that interleaves the iterations (known as *synchronized* or *cross* product [4,29]) so that we reduce the synthesis problem to finding a coupling for two loops.

#### **6 Implementation and Evaluation**

We now discuss our implementation and five case studies used for evaluation.


**Fig. 2.** Case study programs

**Implementation.** To solve formulas of the form <sup>∃</sup>f. <sup>∀</sup>X. ϕ, we implemented a simple solver using a *guess-and-check* loop: We iterate through various interpretations of f, insert them into the formula, and check whether the resulting formula is valid. In the simplest case, we are searching for a function f from n-tuples to n-tuples. For instance, in Sect. 2.2, we discovered the function f(x, y)=(y, x). Our implementation is parameterized by a grammar defining an infinite set of interpretations of f, which involves permuting the arguments (as above), conditionals, and other basic operations (e.g., negation for Boolean variables). For checking validity of <sup>∀</sup>X. ϕ given f, we use the Z3 smt solver [19] for loop-free programs. For loops, we use an existing constrained-Horn-clause solver based on the MathSAT smt solver [18].

**Benchmarks and Results.** As a set of case studies for our approach, we use 5 different programs collected from the literature and presented in Fig. 2. For these programs, we prove uniformity, (conditional) independence properties, and other probabilistic equalities. For instance, we use our implementation to prove a main lemma for the Ballot theorem [20], encoded as the program ballot.

Figure 3 shows the time and number of loop iterations required by our implementation to discover a coupling proof. The small number of iterations and time needed demonstrates the simplicity of the discovered proofs. For instance, the ballot theorem was proved in 3 s and only 4 iterations, while the fairCoin example (illustrated in Sect. 2.2) required only two iterations and 1.4 s. In all cases, the size of the synthesize function f in terms of depth of its ast is no more than 4. We describe these programs and properties in a bit more detail.

**Case Studies: Uniformity (fairCoin, fairDie).** The first two programs produce uniformly random values. Our approach synthesizes a coupling proof certifying uniformity for both of these programs. The first program fairCoin, which we saw in Sect. 2.2, produces a fair coin flip given access to biased coin flips by repeatedly flipping two coins while they are equal, and returning the result of the first coin as soon as the flips differ. Note that the bias of the coin flip is a program parameter, and not fixed statically. The synthesized coupling swaps the result of the two samples, mapping the values of (x, y) to (y, x).

The second program fairDie gives a different construction for simulating a roll of a fair die given fair coin flips. Three fair coins are repeatedly flipped as long as they are all equal; the returned triple is the binary representation of a number in {1,..., <sup>6</sup>}, the result of the simulated roll. The synthesized coupling is a bijection on triples of booleans <sup>B</sup>×B×B; fixing any two possible output triples (b1, b2, b<sup>3</sup>) and (b 1, b 2, b <sup>3</sup>) of distinct booleans, the coupling maps (b1, b2, b<sup>3</sup>) <sup>→</sup> (b 1, b 2, b <sup>3</sup>) and vice versa, leaving all other triples unchanged.


**Fig. 3.** Statistics

**Case Studies: Independence (noisySum, bayes).** In the next two programs, our approach synthesizes coupling proofs of independence and conditional independence of program variables in the output distribution. The first program, noisySum, is a stylized program inspired from privacy-preserving algorithms that sum a series of noisy samples; for giving accuracy guarantees, it is often important to show that the noisy draws are probabilistically independent. We show that any pair of samples are independent.

The second program, bayes, models a simple Bayesian network with three independent variables x, y, z and two dependent variables w and w , computed from (x, y) and (y, z) respectively. We want to show that w and w are independent conditioned on any value of y; intuitively, w and w only depend on each other through the value of y, and are independent otherwise. We use a constraint encoding similar to the encoding for showing independence to find a coupling proof of this fact. Note that the distributions μ, μ , μ of x, y, z are unknown parameters, and the functions f and g are also uninterpreted. This illustrates the advantage of using a constraint-based technique—we can encode unknown distributions and operations as uninterpreted functions.

**Case Studies: Probabilistic Equalities (ballot).** As we mentioned in Sect. 1, our approach extends naturally to proving general probabilistic equalities beyond uniformity and independence. To illustrate, we consider a lemma used to prove Bertrand's Ballot theorem [20]. Roughly speaking, this theorem considers counting ballots one-by-one in an election where there are <sup>n</sup><sup>A</sup> votes cast for candidate <sup>A</sup> and <sup>n</sup><sup>B</sup> votes cast for candidate <sup>B</sup>, where <sup>n</sup>A, n<sup>B</sup> are parameters. If <sup>n</sup><sup>A</sup> > n<sup>B</sup> (so A is the winner) and votes are counted in a uniformly random order, the Ballot theorem states that the probability that A leads throughout the whole counting process—without any ties—is precisely (n<sup>A</sup> <sup>−</sup> <sup>n</sup>B)/(n<sup>A</sup> <sup>+</sup> <sup>n</sup>B).

One way of proving this theorem, sometimes called Andr´e's reflection principle, is to show that the probability of counting the first vote for A and reaching a tie is equal to the probability of counting the first vote for B and reaching a tie. We simulate the counting process slightly differently—instead of drawing a uniform order to count the votes, our program draws uniform samples for votes—but the original target property is equivalent to the equality

$$\Pr[\text{ffirst}\_1 = 0 \land ti e\_1 \land \psi(x\_{A1}, x\_{B1})] = \Pr[\text{ffirst}\_2 = 1 \land ti e\_2 \land \psi(x\_{A2}, x\_{B2})] \tag{9}$$

with <sup>ψ</sup>(xAi, xBi) is <sup>x</sup>Ai <sup>=</sup> <sup>n</sup><sup>A</sup> <sup>∧</sup> <sup>x</sup>Bi <sup>=</sup> <sup>n</sup><sup>B</sup>. Our approach synthesizes a coupling and loop invariant showing that the coupled post-condition is contained in

$$s\_1\{(s\_1, s\_2) \mid s\_1(\text{first} = 0 \land tie \land \psi(x\_A, x\_B)) \iff s\_2(\text{first} = 0 \land tie \land \psi(x\_A, x\_B))\},$$

giving Formula (9) by Proposition 1 (see Barthe et al. [6] for more details).

#### **7 Related Work**

Probabilistic programs have been a long-standing target of formal verification. We compare with two of the most well-developed lines of research: probabilistic model checking and deductive verification via program logics or expectations.

**Probabilistic Model Checking.** Model checking has proven to be a powerful tool for verifying probabilistic programs, capable of automated proofs for various probabilistic properties (typically encoded in probabilistic temporal logics); there are now numerous mature implementations (see, e.g., [21] or [3, Chap. 10] for more details). In comparison, our approach has the advantage of being fully constraint-based. This gives it a number of unique features: (*i*) it applies to programs with unknown inputs and variables over infinite domains; (*ii*) it applies to programs sampling from distributions with parameters, or even ones sampling from unknown distributions modeled as uninterpreted functions in first-order logic; (*iii*) it applies to distributions over infinite domains; and (*iv*) the generated coupling proofs are compact. At the same time, our approach is specialized to the coupling proof technique and is likely to be more incomplete.

**Deductive Verification.** Compared to general deductive verification systems for probabilistic programs, like program logics [5,14,22,26] or techniques reasoning by pre-expectations [25], the main benefit of our technique is automation deductive verification typically requires an interactive theorem prover to manipulate complex probabilistic invariants. In general, the coupling proof method limits reasoning about probabilities and distributions to just the random sampling commands; in the rest of the program, the proof can avoid quantitative reasoning entirely. As a result, our system can work with non-probabilistic invariants and achieve full automation. Our approach also smoothly handles properties involving the probabilities of multiple events, like probabilistic independence, unlike techniques that analyze probabilistic events one-by-one.

**Acknowledgements.** We thank Samuel Drews, Calvin Smith, and the anonymous reviewers for their helpful comments. Justin Hsu was partially supported by ERC grant #679127 and NSF grant #1637532. Aws Albarghouthi was supported by NSF grants #1566015, #1704117, and #1652140.

#### **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

## **Controller Synthesis Made Real: Reach-Avoid Specifications and Linear Dynamics**

Chuchu Fan(B) , Umang Mathur , Sayan Mitra , and Mahesh Viswanathan

> University of Illinois at Urbana Champaign, Champaign, IL, USA {cfan10,umathur3,mitras, vmahesh}@illinois.edu

**Abstract.** We address the problem of synthesizing provably correct controllers for linear systems with reach-avoid specifications. Our solution uses a combination of an open-loop controller and a tracking controller, thereby reducing the problem to smaller tractable problems. We show that, once a tracking controller is fixed, the reachable states from an initial neighborhood, subject to any disturbance, can be overapproximated by a sequence of ellipsoids, with sizes that are independent of the open-loop controller. Hence, the open-loop controller can be synthesized independently to meet the reach-avoid specification for an initial neighborhood. Exploiting several techniques for tightening the over-approximations, we reduce the open-loop controller synthesis problem to satisfiability over quantifier-free linear real arithmetic. The overall synthesis algorithm, computes a tracking controller, and then iteratively covers the entire initial set to find open-loop controllers for initial neighborhoods. The algorithm is sound and, for a class of robust systems, is also complete. We present RealSyn, a tool implementing this synthesis algorithm, and we show that it scales to several high-dimensional systems with complex reach-avoid specifications.

#### **1 Introduction**

The controller synthesis question asks whether an input can be generated for a given system (or a plant) so that it achieves a given specification. Algorithms for answering this question hold the promise of automating controller design. They have the potential to yield high-assurance systems that are correct-byconstruction, and even negative answers to the question can convey insights about unrealizability of specifications. This is not a new or a solved problem, but there has been resurgence of interest with the rise of powerful tools and

This work is supported by the grant CCF 1422798 from the National Science Foundation.

c The Author(s) 2018

H. Chockler and G. Weissenbacher (Eds.): CAV 2018, LNCS 10981, pp. 347–366, 2018. https://doi.org/10.1007/978-3-319-96145-3\_19

compelling applications such as vehicle path planning [11], motion control [10, 23], circuits design [30] and various other engineering areas.

In this paper, we study synthesis for linear, discrete-time, plant models with bounded disturbance—a standard view of control systems [3,17]. We will consider *reach-avoid* specifications which require that starting from any initial state Θ, the controller has to drive the system to a target set G, while avoiding certain unsafe states or obstacles **O**. *Reach-avoid* specifications arise naturally in many domains such as autonomous and assisted driving, multi-robot coordination, and spacecraft autonomy, and have been studied for linear, nonlinear, as well as stochastic models [7,9,14,18].

Textbook control design methods address specifications like stability, disturbance rejection, asymptotic convergence, but they do not provide formal guarantees about reach-avoid specifications. Another approach is based on *discrete abstraction*, where a discrete, finite-state, symbolic abstraction of the original control system is computed, and a discrete controller is synthesized by solving a two-player game on the abstracted game graph. Theoretically, these methods can be applied to systems with nonlinear dynamics and they can synthesize controllers for a general class of LTL specifications. However, in practice, the discretization step leads to a severe state space explosion for higher dimensional models. Indeed, we did not find any reported evaluation of these tools (see related work) on benchmarks that go beyond 5-dimensional plant models.

In this paper, the controller we synthesize, follows a natural paradigm for designing controllers. The approach is to first design an *open-loop* controller for a single initial state x<sup>0</sup> ∈ Θ to meet the reach-avoid specification. This is called the reference trajectory. For the remaining states in the initial set, a *tracking controller* is combined, that drives these other trajectories towards the trajectory starting from x0.

However, designing such a combined controller can be computationally expensive [32] because of the interdependency between the open-loop controller and the tracking controller. Our secret sauce in making this approach feasible, is to demonstrate that the two controllers can be synthesized in a decoupled way. Our strategy is as follows. We first design a tracking controller using a standard control-theoretical method called LQR (linear quadratic regulator) [5]. The crucial observation that helps decouple the synthesis of the tracking and open-loop controller, is that for such a combined controller, once the tracking controller is fixed, the set of states reached from the initial set is contained within a sequence of ellipsoidal sets [24] centered around the reference trajectory. The size of these ellipsoidal sets is solely dependent on the tracking controller, and is independent of the reference trajectory or the open-loop controller. On the flip side, the openloop controller and the resulting reference trajectory can be chosen independent of the fixed tracking controller. Based on this, the problem of synthesizing the open-loop controller can be completely decoupled from synthesizing the tracking controller. Our open-loop controller is synthesized by encoding the problem in logic. The straightforward encoding of the synthesis problem results in a ∃∀ formula in the theory of linear arithmetic. Unfortunately, solving large instances of such formulas using current SMT solvers is challenging. To overcome this, we exploit special properties of polytopes and hyper-rectangles, and reduce the original ∃∀-formula into the quantifier-free fragment of linear arithmetic (QF-LRA).

Our overall algorithm (Algorithm 1), after computing an initial tracking controller, iteratively synthesizes open-loop controllers by solving QF-LRA formulas for smaller subsets that cover the initial set. The algorithm will automatically identify the set of initial states for which the combined tracking+open-loop controller is guaranteed to work. Our algorithm is sound (Theorem 1), and for a class of robust linear systems, it is also complete (Theorem 2).

We have implemented the synthesis algorithm in a tool called RealSyn. Any SMT solver can be plugged-in for solving the open-loop problem; we present experimental results with Z3, CVC4 and Yices. We report the performance on 24 benchmark problems (using all three solvers). Results show that our approach scales well for complex models—including a system with 84-dimensional dynamics, another system with 3 vehicles (12-dimensional) trying to reach a common goal while avoiding collision with the obstacles and each other, and yet another system with 10 vehicles (20 dimensional) trying to maintain a platoon. Real-Syn usually finds a controller within 10 min with the fastest SMT solver. The closest competing tool, Tulip [13,39], does not return any result even for some of the simpler instances.

**Related Work.** We briefly review related work on formal controller synthesis according to the plant model type, specifications, and approaches.

**Plants and Specifications.** In increasing order of generality, the types of plant models that have been considered for controller synthesis are doubleintegrator models [10], linear dynamical models [20,28,34,38], piecewise affine models [18,40], and nonlinear (possibly switched) models [7,25,31,33]. There is also a line of work on synthesis approaches for stochastic plants (see [1], and the references therein). With the exceptions noted below, most of these papers consider continuous time plant models, unlike our work.

There are three classes of specifications typically used for synthesis. In the order of generality, they are: (1) pure safety or invariance specifications [2,15,33], (2) reach-avoid [7,14,15,18,33], and (3) more general LTL and GR(1) [20,26,39] [16,38,40]. For each of these classes both bounded and unbounded-time variants have been considered.

**Synthesis Tools.** There is a growing set of controller synthesis algorithms that are available as implemented tools and libraries. This includes tools like CoSyMa [27], Pessoa [30], LTLMop [22,37], Tulip [13,39], SCOTS [31], that rely on the computation of some sort of a discrete (or symbolic) abstraction. Our trial with a 4-dimensional example on Tulip [13,39] did not finish the discretization step in one hour. LTLMop [22,37] handles GR(1) LTL specifications, which are more general than reach-avoid specifications considered in this paper, but it is designed for 2-dimensional robot models working in the Euclidean plane. An alternative synthesis approach generates mode switching sequences for switched system models [19,21,29,35,41] to meet the specifications. This line of work focuses on a finite input space, instead of the infinite input space we are considering in this paper. Abate et al. [2] use a controller template similar to the one considered in this paper for invariant specifications. A counter-example guided inductive synthesis (CEGIS) approach is used to first find a feedback controller for stabilizing the system. Since this feedback controller may not be safe for all initial states of the system, a separate verification step is employed to verify safety, or alternatively find a counter-example. In the latter case, the process is repeated until a valid controller is found. This is different from our approach, where any controller found needs no further verification. Several of the benchmarks are adopted from [2].

#### **2 Preliminaries and Problem Statement**

**Notation.** For a set A and a finite sequence σ in A∗, we denote the t th element of <sup>σ</sup> by <sup>σ</sup>[t]. <sup>R</sup><sup>n</sup> is the <sup>n</sup>-dimensional Euclidean space. Given a vector <sup>x</sup> <sup>∈</sup> <sup>R</sup><sup>n</sup>, x(i) is the i th component of x. We will use boldfaced letters (for example, **x**, **d**, **u**, etc.,) to denote a sequence of vectors.

For a vector x, x is its transpose. Given an invertible matrix <sup>M</sup> <sup>∈</sup> <sup>R</sup><sup>n</sup>×<sup>n</sup>, x<sup>M</sup> Δ <sup>=</sup> <sup>√</sup> x-M-Mx is called the M*-norm* of x. For M = I, x<sup>M</sup> is the familiar 2-norm. Alternatively, x<sup>M</sup> = Mx2. For a matrix A, A 0 means A is positive definite. Given two symmetric matrices A and B, A B means A−B is negative semi-definite. Given a matrix A and an invertible matrix M of the same dimension, there exists an α ≥ 0 such that A-M-MA αM-M. Intuitively, α is the largest scaling factor that can be achieved by the linear transformation from x to Ax when using M for computing the norm, and can be found as the largest eigenvalue of the symmetric matrix (MAM−<sup>1</sup>)-(MAM−<sup>1</sup>).

Given a vector <sup>c</sup> <sup>∈</sup> <sup>R</sup><sup>n</sup>, an invertible matrix <sup>M</sup>, and a scalar value <sup>r</sup> <sup>≥</sup> 0, we define <sup>E</sup>r(c, M) <sup>Δ</sup> = {x | x − c<sup>M</sup> ≤ r} to be the ellipsoid centered at c with radius <sup>r</sup> and shape <sup>M</sup>. <sup>B</sup>r(c) <sup>Δ</sup> = Er(c, I) is the ball of radius r centered at c. Given two vectors c, v <sup>∈</sup> <sup>R</sup><sup>n</sup>, <sup>R</sup>v(c) <sup>Δ</sup> <sup>=</sup> {<sup>x</sup> | ∧<sup>n</sup> <sup>i</sup>=1 c(i) − v(i) ≤ x(i) ≤ c(i) + v(i)} is the rectangle centered at <sup>c</sup> with the length vector <sup>v</sup>. For a set <sup>S</sup> <sup>⊆</sup> <sup>R</sup><sup>n</sup>, a vector <sup>v</sup> <sup>∈</sup> <sup>R</sup><sup>n</sup>, and a matrix <sup>M</sup> <sup>∈</sup> <sup>R</sup><sup>n</sup>×<sup>n</sup> we define <sup>v</sup> <sup>⊕</sup> <sup>S</sup> <sup>Δ</sup> = {x + v | x ∈ S} and <sup>M</sup> <sup>⊗</sup> <sup>S</sup> <sup>Δ</sup> <sup>=</sup> {Mx <sup>|</sup> <sup>x</sup> <sup>∈</sup> <sup>S</sup>}. We say a set <sup>S</sup> <sup>⊆</sup> <sup>R</sup><sup>n</sup> is a polytope if there is a matrix <sup>A</sup><sup>m</sup>×<sup>n</sup> and a vector <sup>b</sup> <sup>∈</sup> <sup>R</sup><sup>m</sup> such that <sup>S</sup> <sup>=</sup> {<sup>x</sup> <sup>|</sup> Ax <sup>≤</sup> <sup>b</sup>}, and denote by *vert*(S) the set of vertices of S.

#### **2.1 Discrete Time Linear Control Systems**

An (n, m)-*dimensional discrete-time linear system* A is a 5-tuple A, B, Θ, U, D, where (i) <sup>A</sup> <sup>∈</sup> <sup>R</sup><sup>n</sup>×<sup>n</sup> is called the *dynamic matrix*, (ii) <sup>B</sup> <sup>∈</sup> <sup>R</sup><sup>n</sup>×<sup>m</sup> is called the *input matrix*, (iii) <sup>Θ</sup> <sup>⊆</sup> <sup>R</sup><sup>n</sup> is a *set of initial states* (iv) <sup>U</sup> <sup>⊆</sup> <sup>R</sup><sup>m</sup> is the *space of inputs*, (v) <sup>D</sup> <sup>⊆</sup> <sup>R</sup><sup>n</sup> is the *space of disturbances*.

A *control sequence* for an (n, m)-dimensional system A is a (possibly infinite) sequence **u** = **u**[0], **u**[1],..., where each **u**[t] ∈ U. Similarly, a *disturbance* *sequence* for A is a (possibly infinite) sequence **d** = **d**[0], **d**[1],..., where each **d**[t] ∈ D. Given control **u** and disturbance **d**, and an initial state **x**[0] ∈ Θ, the *execution of* A is uniquely defined as the (possibly infinite) sequence of states **x** = **x**[0], **x**[1],... , where for each t > 0,

$$\mathbf{x}[t+1] = A\mathbf{x}[t] + B\mathbf{u}[t] + \mathbf{d}[t].\tag{1}$$

<sup>A</sup> *(state feedback) controller* for <sup>A</sup> is a function <sup>g</sup> : <sup>Θ</sup> <sup>×</sup> <sup>R</sup><sup>n</sup> <sup>→</sup> <sup>R</sup>m, that maps an initial state and a (current) state to an input. That is, given an initial state <sup>x</sup><sup>0</sup> <sup>∈</sup> <sup>Θ</sup> and state <sup>x</sup> <sup>∈</sup> <sup>R</sup><sup>n</sup> at time <sup>t</sup>, the control input to the plant at time <sup>t</sup> is:

$$\mathbf{u}[t] = g(x\_0, x). \tag{2}$$

This controller is allowed to use the memory of some initial state x<sup>0</sup> (not necessarily the current execution's initial state) for deciding the current state-dependent feedback. Thus, given an initial state **x**[0], a disturbance **d**, and a state feedback controller g, Eqs. (1) and (2) define a unique execution **x** of A. A state x is *reachable in t-steps* if there exists an execution **x** of A such that **x**[t] = x. The set of all reachable states from S ⊆ Θ in exactly T steps using the controller g is denoted by ReachA,g(S, T). When A and g are clear from the context, we write Reach(S, T).

#### **2.2 Bounded Controller Synthesis Problem**

Given a (n, m)-*dimensional discrete-time linear system* A, a sequence **O** of *obstacles* or unsafe sets (with **<sup>O</sup>**[t] <sup>⊆</sup> <sup>R</sup><sup>n</sup>, for each <sup>t</sup>), a *goal* <sup>G</sup> <sup>⊆</sup> <sup>R</sup><sup>n</sup>, and a time bound T, the *bounded time controller synthesis problem* is to find, a state feedback controller <sup>g</sup> such that for every initial state <sup>θ</sup> <sup>∈</sup> <sup>Θ</sup> and disturbance **<sup>d</sup>** <sup>∈</sup> <sup>D</sup><sup>T</sup> , the unique execution **x** of A with g, starting from **x**[0] = θ satisfies (i) for all t ≤ T, **u**[t] ∈ U, (ii) for all t ≤ T, **x**[t] ∈ **O**[t], and (iii) **x**[T] ∈ G.

For the rest of the paper, we will assume that each of the sets in {**O**[t]}<sup>t</sup>∈<sup>N</sup>, G and U are closed polytopes. Moreover, we assume that the pair (A, B) is controllable [3].

**Example.** *Consider a mobile robot that needs to reach the green area of an apartment starting from the entrance area, while avoiding the gray areas (Fig. 1). The robot's dynamics is described by a linear model (for example the navigation model from [12]). The obstacle sequence* **O** *here is static, that is,* **O**[t] = **O**[0] *for all* t ≥ 0*. Both* Θ *and* G *are rectangles. Although these sets are depicted in 2D, the dynamics of the robot may involve a higher dimensional state space.*

**Fig. 1.** The settings for controller synthesis of a mobile robot with reach-avoid specification.

*In this example, there is no disturbance,*

*but a similar problem can be formulated for an drone flying outdoors, in which case, the disturbance input would model the effect of wind. Time-varying obstacle sets are useful for modeling safety requirements of multi-robot systems.*

#### **3 Synthesis Algorithm**

#### **3.1 Overview**

The controller synthesis problem requires one to find a state feedback controller that ensures that the trajectory starting from any initial state in Θ will meet the reach-avoid specification. Since the set of initial states Θ will typically be an infinite set, this requires the synthesized feedback controller g to have an effective representation. Thus, an "enumerative" representation, where a (separate) *openloop controller* is constructed for each initial state, is not feasible — by an openloop controller for initial state x<sup>0</sup> ∈ Θ, we mean a control sequence **u** such that the corresponding execution **x** with **x**[0] = x<sup>0</sup> and 0 disturbance satisfies the reach-avoid constraints. We, therefore, need a useful template that will serve as the representation for the feedback controller.

In control theory, one natural controller design paradigm is to first find a *reference execution* **x**ref which uses an open-loop controller, then add a *tracking controller* which tries to force other executions **x** starting from different initial states **x**[0] to get close to **x**ref by minimizing the distance between **x**ref and **x**. This form of controller combining open-loop control with tracking control is also proposed in [32] for reach-avoid specifications. The resulting trajectory under a combination of tracking controller plus reference trajectory can be described by the following system of equations.

$$\begin{cases} \mathbf{u}[t] = \mathbf{u}\_{\text{ref}}[t] + K(\mathbf{x}[t] - \mathbf{x}\_{\text{ref}}[t]), \text{with} \\ \mathbf{x}\_{\text{ref}}[t+1] = A \mathbf{x}\_{\text{ref}}[t] + B \mathbf{u}\_{\text{ref}}[t] \end{cases} \tag{3}$$

The tracking controller is given by the matrix K that determines the additive component of the input based on the difference between the current state and the reference trajectory. Once **x**ref[0] and the open-loop control sequence **u**ref is fixed, the value of **<sup>x</sup>**ref[t] is determined at each time step <sup>t</sup> <sup>∈</sup> <sup>N</sup>. Therefore, the controller g is uniquely defined by the tuple K, **x**ref[0], **u**ref. We could rewrite the linear system in (3) as an augmented system

$$
\begin{bmatrix} \mathbf{x} \\ \mathbf{x\_{ref}} \end{bmatrix} [t+1] = \begin{bmatrix} A + BK - BK \\ 0 & A \end{bmatrix} \begin{bmatrix} \mathbf{x} \\ \mathbf{x\_{ref}} \end{bmatrix} [t] + \begin{bmatrix} B \ 0 \\ 0 \ B \end{bmatrix} \begin{bmatrix} \mathbf{u\_{ref}} \\ \mathbf{u\_{ref}} \end{bmatrix} [t], + \begin{bmatrix} \mathbf{d} \\ 0 \end{bmatrix} [t].
$$

This can be rewritten as **x**ˆ[t + 1] = Aˆ**x**ˆ[t] + Bˆ**u**ˆ[t] + **d**ˆ[t]. The closed-form solution is **x**ˆ[t] = Aˆ<sup>t</sup> **x**ˆ[0] + <sup>t</sup>−<sup>1</sup> <sup>i</sup>=0 <sup>A</sup>ˆ<sup>t</sup>−1−<sup>i</sup> (Bˆ**u**ˆ[i] + **d**ˆ[i]). To synthesize a controller g of this form, therefore, requires finding K, **x**ref[0], **u**ref such that the closed-form solution meets the reach-avoid specification. This is indeed the approach followed in [32], albeit in the continuous time setting. Observe that in the closed-form solution, Aˆ, **u**ˆ, and **x**ˆ[0] all depend on parameters that we need to synthesize. Therefore, solving such constraints involves polynomials whose degrees grow with the time bound. This is very expensive, and unlikely to scale to large dimensions and time bounds.

In this paper, to achieve scalability, we take a slightly different approach than the one where K, **x**ref[0], and **u**ref are simultaneously synthesized. We first synthesize a tracking controller K, *independent* of **x**ref[0] and **u**ref, using the standard LQR method. Once K is synthesized, we show that, no matter what **x**ref[0], and **u**ref are, the state of the system at time t starting from x<sup>0</sup> is guaranteed to be contained within an ellipsoid centered at **x**ref[t] and of radius that depends only on K, the initial distance between x<sup>0</sup> and **x**ref[0], time t, and disturbance. Moreover, this radius is only a *linear* function of the initial distance (Lemma 1). Thus, if we can synthesize an open-loop controller **u**ref starting from some state **x**ref[0], such that ellipsoids centered around **x**ref satisfy the reach-avoid specification, we can conclude that the combined controller will work correctly for all initial states in some ball around the initial state **x**ref[0]. The radius of the ball around **x**ref[0] for which the controller is guaranteed to work, will depend on the radii of the ellipsoids around **x**ref that satisfy the reach-avoid specification. This decoupled approach to synthesis is the first key idea in our algorithm.

Following the above discussion, crucial to the success of the decoupled approach is to obtain a tight characterization of the radius of the ellipsoid around **x**ref[t] that contains the reach set, as a function of the initial distance — too conservative a bound will imply that the combined controller only works for a tiny set of initial states. The ellipsoid's shape and direction, which is characterized by a coordinate transformation matrix M, also affect the tightness of the over-approximations. We determine the shape and direction of the ellipsoids that give us the tightest over-approximation using an SDP solver (Sect. 3.4).

Synthesizing the tracking controller K, still leaves open the problem of synthesizing an open-loop controller for an initial state **x**ref[0]. A straightforward encoding of the problem of synthesizing a open-loop controller, that works for all initial states in some ball around **x**ref[0], results in a ∃∀-formula in the theory of real arithmetic. Unfortunately solving such formulas does not scale to large dimensional systems using current SMT solvers. The next key idea in our algorithm is to simplify these constraints. By exploiting special properties of polytopes and hyper-rectangles, we reduce the original ∃∀-formula into the *quantifier-free* fragment of *linear* real arithmetic (QF-LRA) (Sect. 3.5).

Putting it all together, the overall algorithm (Algorithm 1) works as follows. After computing an initial tracking controller K, coordinate transformation M for optimal ellipsoidal approximation of reach-sets, it synthesizes open-loop controllers for different initial states by solving QF-LRA formulas. After each openloop controller is synthesized, the algorithm identifies the set of initial states for which the combined tracking+open-loop controller is guaranteed to work, and removes this set from Θ. In each new iteration, it picks a new initial state not covered by previous combined controllers, and the process terminates when all of Θ is covered. Our algorithm is sound (Theorem 1)—whenever a controller is synthesized, it meets the specifications. Further, for robust systems (defined later in the paper), our algorithm is guaranteed to terminate when the system has a combined controller for all initial states (Theorem 2).

#### **3.2 Synthesizing the Tracking Controller K**

Given any open-loop controller **u**ref and the corresponding reference execution **x**ref, by replacing in Eq. (1) the controller of Eq. (3) we get:

$$\mathbf{x}[t+1] = (A+BK)\mathbf{x}[t] - BK\mathbf{x}\_{\text{ref}}[t] + B\mathbf{u}\_{\text{ref}}[t] + \mathbf{d}[t].\tag{4}$$

Subtracting **x**ref[t+ 1] from both sides, we have that for any execution **x** starting from the initial states **x**[0] and with disturbance **d**, the distance between **x** and **x**ref changes with time as:

$$\mathbf{x}[t+1] - \mathbf{x}\_{\text{ref}}[t+1] = (A + BK)(\mathbf{x}[t] - \mathbf{x}\_{\text{ref}}[t]) + \mathbf{d}[t].\tag{5}$$

With A<sup>c</sup> Δ = A + BK, **y**[t] <sup>Δ</sup> = **x**[t + 1] − **x**ref[t + 1], Eq. (5) becomes **y**[t + 1] = Ac**y**[t] + d[t]. We want **x**[t] to be as close to **x**ref[t] as possible, which means K should be designed to make |**y**[t]| converge to 0. Equivalently, K should be designed as a linear feedback controller such that A<sup>c</sup> is stable<sup>1</sup>. Such a matrix K can be computed using classical control theoretic methods. In this work, we compute K as a linear (stable) feedback controller using LQR as stated in the following proposition.

**Proposition 1 (LQR).** *For linear system* A *with* (A, B) *to be controllable and* <sup>0</sup> *disturbance, fix any* Q, R <sup>0</sup> *and let* <sup>J</sup> <sup>Δ</sup> = **x**-[T]Q**x**[T] + <sup>T</sup> <sup>−</sup><sup>1</sup> <sup>i</sup>=0 (**x**-[i]Q**x**[i] + **u**-[i]R**u**[i]) *be the corresponding quadratic cost. Let* X *be the unique positive definite solution to the discrete-time Algebraic Riccati Equation (ARE):* A-XA− X − A-XB(B-XB + R)−1B-XA + Q = 0, *and* K <sup>Δ</sup> = −(B-XB + R)−1B-XA*. Then* A + BK *is stable, and the corresponding feedback input minimizes* J*.*

Methods for choosing Q and R are outside the scope of this paper. We fix Q and R to be identity matrices for most examples. Roughly, for a given R, scaling up Q results in a K that makes an execution **x** converge faster to the reference execution **x**ref.

#### **3.3 Reachset Over-Approximation with Tracking Controller**

We present a method for over-approximating the reachable states of the system for a given tracking controller K (computed as in Proposition 1) and an openloop controller **u**ref (to be computed in Sect. 3.5).

**Lemma 1.** *Consider any* <sup>K</sup> <sup>∈</sup> <sup>R</sup><sup>m</sup>×<sup>n</sup>*, any initial set* <sup>S</sup> ⊆ E<sup>r</sup><sup>0</sup> (**x***ref*[0], M) *and disturbance* <sup>D</sup> ⊆ Eδ(0, M)*, where* <sup>r</sup>0, δ <sup>≥</sup> <sup>0</sup> *and* <sup>M</sup> <sup>∈</sup> <sup>R</sup><sup>n</sup>×<sup>n</sup> *is invertible.*

*For any open-loop controller* **u***ref and the corresponding reference execution* **x***ref,*

$$\mathcal{R}\mathsf{ae}\mathcal{C}h(S,t)\subseteq\mathcal{E}\_{r\_t}(\mathbf{x}\_{\mathsf{ref}}[t],M), \forall\ t\leq T,\tag{6}$$

*where* r<sup>t</sup> = α*<sup>t</sup>* <sup>2</sup> r0+<sup>t</sup>−<sup>1</sup> <sup>i</sup>=0 <sup>α</sup> *<sup>i</sup>* <sup>2</sup> δ*, and* α ≥ 0 *is such that* (A+BK)-M-M(A+BK) αM-M*.*

<sup>1</sup> A + BK has spectral radius ρ(A + BK) < 1.

Lemma 1 can be proved using the triangular inequality for the norm of Eq. (5). From Lemma 1, it follows that given a open-loop controller **u**ref and the corresponding reference trajectory **x**ref, the reachable states from S ⊆ Er<sup>0</sup> (**x**ref[0], M) at time t can be over-approximated by an ellipsoid centered at **x**ref[t] with size r<sup>t</sup> Δ = α*<sup>t</sup>* <sup>2</sup> r<sup>0</sup> + <sup>t</sup>−<sup>1</sup> <sup>i</sup>=0 <sup>α</sup> *<sup>i</sup>* <sup>2</sup> δ. Here M is any invertible matrix that defines the shape of the ellipsoid and it influences the value of α. As the overapproximation (rt) grows exponentially with t, it makes sense to choose M in a way that makes α small. In next section, we discuss how M and α are chosen to achieve this.

#### **3.4 Shaping Ellipsoids for Tight Over-Approximating Hyper-rectangles**

The choice of M and the resulting α may seem like a minor detail, but a bad choice here can doom the rest of the algorithm to be impractical. For example, if we fix M to be the identity matrix I, the resulting value of α may give over-approximations that are too conservative. Even if the actual executions are convergent to **x**ref the resulting over-approximation can exponentially blow up.

We find the smallest exponential convergence/divergence rate (α) by solving for P in the following semi-definite program (SDP):

$$\begin{array}{ll}\min\_{P\succ 0, \alpha \in \mathbb{R}} & \alpha\\ \text{s.t.} & (A+BK)^{\mathsf{T}}P(A+BK) \preceq \alpha P. \end{array} \tag{7}$$

This gives M as the unique matrix such that P = M<sup>T</sup>M.

In the rest of the paper, the reachset over-approximations will be represented by hyper-rectangles to allow us to efficiently use the existing SMT solvers. That is, the ellipsoids given by Lemma 1 have to be bounded by hyper-rectangles. For any coordinate transformation matrix M, the ellipsoid with unit size E1(0, M) ⊆ <sup>R</sup>v(0), with <sup>v</sup>(i) = min <sup>x</sup>∈E1(0,M) x(i). This v(i) is also computed by solving an SDP. Similarly, Er(0, M) ⊆ Rrv(0). Therefore, from Lemma 1, it follows that Reach(S, t) ⊆ R<sup>r</sup>*t*<sup>v</sup>(**x**ref[t]) with <sup>r</sup><sup>t</sup> <sup>=</sup> <sup>α</sup>*<sup>t</sup>* <sup>2</sup> r<sup>0</sup> + <sup>t</sup>−<sup>1</sup> <sup>i</sup>=0 <sup>α</sup> *<sup>i</sup>* <sup>2</sup> δ and v is the size vector of the rectangle bounding E1(0, M). These optimization problems for computing M, α, and v have to be solved once per synthesis problem.

**Example.** *Continuing the previous example. Suppose robot is asked to reach the target set in* 20 *steps. Figure 2 shows the projection of the reachset on the robot's position with synthesized controller. The curves are the references executions* **x***ref from 2 initials cover and the rectangles are reachset overapproximations such that every execution of the system starting from each initial cover is guaranteed to be inside the rectangles at each time step.*

**Fig. 2.** Robot's position with the synthesized controllers using Algorithm 1.

#### **3.5 Synthesis of Open-Loop Controller**

In this section, we will discuss the synthesis of the open-loop controller **u**ref in K, **x**ref[0], **u**ref. From the previous section, we know that given an initial set S, a tracking controller K, and an open-loop controller **u**ref, the reachable set (under any disturbance) at time t is over-approximated by Rr*t*v(**x**ref[t]). Thus, once we fix K and **x**ref[0], the problem of synthesizing a controller reduces to the problem of synthesizing an appropriate **u**ref such that the reachset over-approximations meet the reach-avoid specification. Indeed, for the rest of the presentation, we will assume a fixed K.

For synthesizing **u**ref, we would like to formalize the problem in terms of constraints that will allow us to use SMT solvers. In the following, we describe the details of how this problem can be formalized as a quantifier-free first order formula over the theory of reals. We will then lay out specific assumptions and/or simplifications required to reduce the problem to QF-LRA theory, which is implemented efficiently in existing state-of-the-art SMT solvers. Most SMT solvers also provide the functionality of explicit model generation, and the concrete controller values can be read-off from the models generated when the constraints are satisfiable.

**Constraints for Synthesizing u**ref**.** Let us fix an initial state x<sup>0</sup> and a radius r, defining a set of initial states S = Br(x0). The **u**ref synthesis problem can be stated as finding satisfying solutions for the formula φsynth(x0, r).

$$\phi\_{\text{synth}}(x\_0, r) \stackrel{\Delta}{=} \exists \mathbf{u}\_{\text{ref}}[0], \mathbf{u}\_{\text{ref}}[1], \dots, \mathbf{u}\_{\text{ref}}[T-1],$$

$$\begin{array}{l} \exists \mathbf{x}\_{\text{ref}}[0], \mathbf{x}\_{\text{ref}}[1], \dots, \mathbf{x}\_{\text{ref}}[T],\\ \phi\_{\text{control}}(\mathbf{u}\_{\text{ref}}) \land \phi\_{\text{sucaction}}(\mathbf{u}\_{\text{ref}}, \mathbf{x}\_{\text{ref}}, x\_0) \\ \land \phi\_{\text{2vold}}(x\_0, r, \mathbf{u}\_{\text{ref}}, \mathbf{x}\_{\text{ref}}) \land \phi\_{\text{racch}}(x\_0, r, \mathbf{u}\_{\text{ref}}, \mathbf{x}\_{\text{ref}}) \end{array} \tag{8}$$

where φcontrol constrains the space of inputs, φexecution states that the sequence **x**ref is a reference execution following Eq. (3), φavoid specifies the safety constraint, φreach specifies that the system reaches G:

$$\begin{split} \phi\_{\text{control}}(\mathbf{u}\_{\text{ref}}) & \stackrel{\Delta}{=} \bigwedge\_{t=0}^{T-1} \mathbf{u}\_{\text{ref}}[t] \oplus \left( K \otimes \mathcal{R}\_{r\_{\text{ref}}}(0) \right) \subseteq U \\ \phi\_{\text{associative}}(\mathbf{u}\_{\text{ref}}, \mathbf{x}\_{\text{ref}}, x\_{0}) & \stackrel{\Delta}{=} (\mathbf{x}\_{\text{ref}}[0] = x\_{0}) \wedge \bigwedge\_{t=0}^{T-1} (\mathbf{x}\_{\text{ref}}[t+1] = A \mathbf{x}\_{\text{ref}}[t] + B \mathbf{u}\_{\text{ref}}[t]) \\ \phi\_{\text{2odd}}(x\_{0}, r, \mathbf{u}\_{\text{ref}}, \mathbf{x}\_{\text{ref}}) & \stackrel{\Delta}{=} \bigwedge\_{t=0}^{T} \mathcal{R}\_{r \, v}(\mathbf{x}\_{\text{ref}}[t]) \cap \mathbf{O}[t] = \mathcal{Q} \\ \phi\_{\text{nach}}(x\_{0}, r, \mathbf{u}\_{\text{ref}}, \mathbf{x}\_{\text{ref}}) & \stackrel{\Delta}{=} \mathcal{R}\_{r \, v}(\mathbf{x}\_{\text{ref}}[T]) \subseteq G. \end{split} \tag{9}$$

As discussed in Sect. 3.2, the vector v and the constants r0,...,r<sup>T</sup> are precomputed using the radius r of the initial ball.

We make a few remarks about this formulation. First, each of the formulas φcontrol, φavoid and φreach represent sufficient conditions to check for the existence of **u**ref. Second, the constraints stated above belong to the (decidable) theory of reals. However, φcontrol, φavoid and φreach, and thus φsynth, are not quantifier free as they use subset and disjointness checks. This is because for sets S, T

expressed as predicates <sup>ϕ</sup>S(·) and <sup>ϕ</sup><sup>T</sup> (·), <sup>S</sup> <sup>∩</sup> <sup>T</sup> <sup>=</sup> <sup>∅</sup> corresponds to the formula <sup>∀</sup><sup>x</sup> · ¬(ϕS(x) <sup>∧</sup> <sup>ϕ</sup><sup>T</sup> (x)) and <sup>S</sup> <sup>⊆</sup> <sup>T</sup> (or equivalently <sup>S</sup> <sup>∩</sup> <sup>T</sup><sup>c</sup> <sup>=</sup> <sup>∅</sup>) corresponds to the formula ∀x · ϕS(x) =⇒ ϕ<sup>T</sup> (x).

**Reduction to** QF-LRA. Since the sets G and U are bounded polytopes, G<sup>c</sup> and U<sup>c</sup> can be expressed as finite unions of (possibly unbounded) polytopes. Thus, the subset predicates **u**ref[t]⊕ K ⊗Rr*t*v(0) ⊆ U in φcontrol and Rr*t*v(**x**ref[t]) ⊆ G in φreach can be expressed as a disjunction over finitely many predicates, each expressing the disjointness of two polytopes.

The central idea behind eliminating the universal quantification in the disjointness predicates in φavoid or in the inferred disjointness predicates in φreach and φcontrol, is to find a separating hyperplane that witnesses the disjointness of two polytopes. Let P<sup>1</sup> = {x | A1x ≤ b1} and P<sup>2</sup> = {x | A2x ≤ b2} be two polytopes such that P<sup>1</sup> is closed and bounded. Then, if there is an i for which each vertex v of P<sup>1</sup> satisfies A(i) <sup>2</sup> v>b2(i), we must have that <sup>P</sup><sup>1</sup> <sup>∩</sup>P<sup>2</sup> <sup>=</sup> <sup>∅</sup>, where A(i) <sup>2</sup> is the i th row vector of the matrix A2. That is, such a check is sufficient to ensure disjointness. Thus, in the formula φavoid, in order to check if R<sup>r</sup>*t*<sup>v</sup>(**x**ref[t]) does not intersect with **O**[t], we check if there is a face of the polytope **O**[t] such that all the vertices of R<sup>r</sup>*t*<sup>v</sup>(**x**ref[t]) lie on the other side of the face. The same holds for each of the inferred predicates in φreach and φcontrol. Eliminating quantifiers is essential to scale our analysis to large high dimensional systems.

Further, when the set G has a hyper-rectangle representation, the containment check R<sup>r</sup>*t*<sup>v</sup>(**x**ref[T]) ⊆ G can directly be encoded as the conjunction of O(n) linear inequalities, stating that for each dimension i, the lower and the upper bounds of R<sup>r</sup>*t*<sup>v</sup>(**x**ref[t]) in the i th dimension, satisfy l <sup>i</sup> ≤ l<sup>i</sup> ≤ u<sup>i</sup> ≤ u i, where l <sup>i</sup> and r <sup>i</sup> represent the bounds for G in the i th dimension. Similarly, when **O**[t] has a rectangle representation, we can formulate the emptiness constraint <sup>R</sup><sup>r</sup>*t*<sup>v</sup>(**x**ref[t])<sup>∩</sup> **<sup>O</sup>**[t] = <sup>∅</sup> as <sup>n</sup> i=1 (u<sup>i</sup> < l <sup>i</sup> ∨l<sup>i</sup> > u <sup>i</sup>), where l<sup>i</sup> and u<sup>i</sup> (resp. l <sup>i</sup> and u i) th dimension.

are the lower and upper bounds of R<sup>r</sup>*t*<sup>v</sup>(**x**ref[t]) (resp. **O**[t]) in the i Since such simplifications can exponentially reduce the number of constraints generated, they play a crucial for the scalability.

The constraints for checking emptiness and disjointness, as discussed above, only give rise to linear constraints, do not have the ∀ quantification over states, and is a sound transformation of φsynth into QF-LRA. In Sect. 3.6 we will see that the reach set over-approximation can be made arbitrarily small when the disturbance is 0 by arbitrarily shrinking the size of the initial cover. Thus, these checks will also turn out to be sufficient to ensure that if there exists a controller, φsynth is satisfiable.

**Lemma 2.** *Let* <sup>v</sup> <sup>∈</sup> <sup>R</sup><sup>n</sup> *and* <sup>r</sup>0,...,r<sup>T</sup> <sup>∈</sup> <sup>R</sup> *be such that for any execution* **<sup>x</sup>***ref starting at* x0*, we have* ∀t ≤ T · *Reach*(Br(x0), t) ⊆ R<sup>r</sup>*t*<sup>v</sup>(**x***ref*[t])*. If the formula* φ*synth*(x0, r) *is satisfiable, then there is a control sequence* **u***ref such that for every* <sup>x</sup> ∈ Br(x0) *and for every* **<sup>d</sup>** ∈ D<sup>T</sup> *, the unique execution* **<sup>x</sup>** *defined by the controller* K, x0, **u***ref and* **d***, starting at* x *satisfies* **x**[T] ∈ G ∧ ∀t ≤ T · **x**[t] ∈ **O**[t]*.*

We remark that a possible alternative for eliminating the ∀ quantifier is the use of Farkas' Lemma, but this gives rise to nonlinear constraints<sup>2</sup>. Indeed, in our experimental evaluation, we observed the downside of resorting to Farkas' Lemma in this problem.

#### **3.6 Synthesis Algorithm Putting It All Together**

The presentation in Sect. 3.5 describes how to formalize constraints to generate a control sequence that works for a subset of the initial set Θ. The overall synthesis procedure (Algorithm 1), first computes a tracking controller K, then generates open-loop control sequences and reference executions in order to cover the entire set Θ.

```
Algorithm 1. Algorithm for Synthesizing Combined Controller
```

```
1: Input: A,T, O[0],..., O[T], G, Q, R
2: r∗ ← diameter(Θ)/2
3: K, v, c1, c2 ← bloatParams(A, T, Q, R)
4: cover ← ∅
5: controllers ← ∅
6: while Θ ⊆ cover do
7: ψsynth ← getConstraints(A,T, O[0],..., O[T], G, v, c1, c2, r∗, cover)
8: if checkSat(ψsynth) = SAT then
9: r, uref, xref ← model(ψsynth)
10: cover ← cover ∪ Br(xref[0])
11: controllers ← controllers ∪ { ( K, xref[0], uref , Br(xref[0]) ) }
12: else
13: r∗ ← r∗/2
14: return controllers;
```
The procedure bloatParams, computes a tracking controller K, a vector v and real valued parameters {c1[t]}<sup>t</sup>≤<sup>T</sup> , {c2[t]}<sup>t</sup>≤<sup>T</sup> , for the system A and time bound T with Q, R for the LQR method. Given any reference execution **x**ref and an initial set <sup>B</sup>r(**x**ref[0]), the parameters computed by bloatParams can be used to over-approximate Reach(Br(**x**ref[0]), t) with the rectangle R<sup>v</sup>- (**x**ref[t]), where v = (c1[t]r + c2[t])v. The computation of these parameters proceeds as follows. Matrix K is determined using LQR (Proposition 1). Now we use Equation (7) to compute the matrix M and the rate of convergence α. Vector v is then computed such that E1(0, M) is bounded by Rv(0). Let runit = max<sup>x</sup>∈B1(0) x<sup>M</sup> and δ = max<sup>d</sup>∈D dM. Then we have, Br(x0) ⊆ E<sup>r</sup>·runit(x0, M) for any x0. The constants c1[0],...c1[T], c2[0],...c2[T] are computed as c1[t] = α*<sup>t</sup>* <sup>2</sup> runit and c2[t] = <sup>t</sup>−<sup>1</sup> <sup>i</sup>=0 <sup>α</sup> *<sup>i</sup>* <sup>2</sup> δ; Sects. 3.2–3.4 establish the correctness guarantees of these

<sup>2</sup> Farkas' Lemma introduces auxiliary variables that get multiplied with existing variables **x**ref[0],..., **x**ref[T], leading to nonlinear constraints.

parameters. Clearly, these computations are independent of any reference executions **x**ref and control sequences **u**ref.

The procedure getConstraints constructs the logical formula ψsynth below such that whenever ψsynth holds, we can find an initial radius r, and center x<sup>0</sup> in the set Θ \ cover and a control sequence **u**ref such that any controlled execution starting from Br(x0) satisfies the reach-avoid requirements.

$$\psi\_{\mathsf{synth}} \triangleq \exists x\_0 \exists r \cdot \left( x\_0 \in \Theta \land x\_0 \notin \mathsf{cover} \land r > r^\* \land \phi\_{\mathsf{synth}}(x\_0, r) \right) \tag{10}$$

Recall that the constants r0,...,r<sup>T</sup> used in φsynth are affine functions of r and thus ψsynth falls in the QF-LRA fragment.

Line 8 checks for the satisfiability of ψsynth. If satisfiable, we extract the model generated to get the radius of the initial ball, the control sequence **u**ref and the reference execution **x**ref in Line 9. The generated controller K, **x**ref[0], **u**ref is guaranteed to work for the ball Br(**x**ref[0]), which can be marked *covered* by adding it to the set cover. In order to keep all the constraints linear, one can further underapproximate Br(**x**ref[0]) with the rectangle Rw(**x**ref[0]), where <sup>w</sup>(i) = r/√<sup>n</sup> for each dimension <sup>i</sup> <sup>≤</sup> <sup>n</sup>. If <sup>ψ</sup>synth is unsatisfiable, then we reduce the minimum radius r<sup>∗</sup> (Line 13) and continue to look for controllers, until we find that Θ ⊆ cover.

The set controllers is the set of pairs (K, x0, **u**ref, S), such that the controller K, x0, **u**ref drives the set S to meet the desired specification. Each time a new controller is found, it is added to the set controllers together with the initial set for which it works (Line 11). The following theorem asserts the soundness of Algorithm 1, and it follows from Lemmas 1 and 2.

**Theorem 1.** *If Algorithm 1 terminates, then the synthesized controller is correct. That is, (a) for each* x ∈ Θ*, there is a* (K, x0, **u***ref*, S) ∈ *controllers, such that* x ∈ S*, and (b) for each* (K, x0, **u***ref*, S) ∈ *controllers, the unique controller* K, x0, **<sup>u</sup>***ref is such that for every* <sup>x</sup> <sup>∈</sup> <sup>S</sup> *and for every* **<sup>d</sup>** <sup>∈</sup> <sup>D</sup><sup>T</sup> *, the unique execution defined by* K, x0, **u***ref and* **d***, starting at* x*, satisfies the reach-avoid specification.*

Algorithm 1 ensures that, upon termination, every x ∈ Θ is covered, i.e., one can construct a combined controller that drives x to G while avoiding **O**. However it may find multiple controllers for a point x ∈ Θ. This non-determinism can be easily resolved by picking any controller assigned for x.

Below, we show that, under certain robustness assumptions on the system A, G and the sets **O**, and in the absence of disturbance Algorithm 1 terminates.

**Robustly Controllable Systems.** A system A = A, B, Θ, U, D is said to be ε-robustly controllable (ε > 0) with respect to the reach-avoid specification (**O**, G) and matrix K, if (a) D = {0}, and (b) for every initial state <sup>θ</sup> <sup>∈</sup> <sup>Θ</sup> and for every open loop-controller **<sup>u</sup>**ref <sup>∈</sup> <sup>U</sup><sup>T</sup> such that the unique execution starting from θ using the open-loop controller **u**ref satisfies the reach-avoid specification, then with the controller K, θ, **u**ref defined as in Equation (3), <sup>∀</sup><sup>t</sup> <sup>≤</sup> T, Reach(Bε(θ), t) <sup>∩</sup> **<sup>O</sup>**[t] = <sup>∅</sup> and Reach(Bε(θ), T) <sup>⊆</sup> <sup>G</sup>, i.e., <sup>∀</sup><sup>x</sup> ∈ Bε(θ), the unique trajectory **x** defined by the controller K, θ, **u**ref starting from x also satisfies the reach avoid specification.

**Theorem 2.** *Let* A *be* ε*-robust with respect to the reach-avoid specification* (**O**, G) *and* K*, for some* ε > 0*. If there is a controller for* A *that satisfies the reach-avoid specification, then Algorithm 1 terminates.*

When the system is robust, then (in the absence of any disturbance i.e., D = {0}), the sizes r0, r1,...,r<sup>T</sup> of the hyper-rectangles that overapproximate reachsets go arbitrarily close to 0 as the initial cover converges to a single point (as seen in Lemma 1). Therefore, the over-approximations can be made arbitrarily precise as r<sup>∗</sup> decreases. Moreover, as r<sup>∗</sup> approaches 0, Eq. (9) (with simplifications for QF-LRA), also becomes satisfiable whenever there is a controller. The correctness of Theorem 2 follows from both these observations.

### **4 RealSyn Implementation and Evaluation**

#### **4.1 Implementation**

We have implemented our synthesis algorithm in a tool called RealSyn. Real-Syn is written in Python. For solving Eq. (10) it can interface with any SMT solver through Python APIs. We present experimental results with Z3 (version 4.5.1) [6], Yices (version 2.5.4) [8], and CVC4 (version 1.5) [4]. RealSyn leverages the incremental solving capabilities of these solvers as follows: The constraints ψsynth generated (line 8 in Algorithm 1) can be expressed as ∃x0, ∃r · ψ<sup>1</sup> ∧ ψ2, where ψ<sup>1</sup> Δ = φsynth(x0, r) and ψ<sup>2</sup> Δ = x<sup>0</sup> ∈ Θ∧x<sup>0</sup> ∈ cover∧r>r∗. Since the bulk of the formula φsynth(x0, r) is in ψ<sup>1</sup> and it does not change across iterations, we can generate this formula only once, and push it on the context stack of the solvers. The formula ψ<sup>2</sup> is different across iterations, and can be pushed and popped out of the stack as required. This minimizes the time taken for generation of constraints.

#### **4.2 Evaluation**

We use 24 benchmark examples<sup>3</sup> to evaluate the performance of RealSyn with three different solvers on a standard laptop with Intel-<sup>R</sup> CoreTM i7 processor, 16 GB RAM, running Ubuntu 16.04. The results are reported in Table 1. The results are encouraging and demonstrate the effectiveness of using our approach and the feasibility of scalable controller synthesis for high dimensional systems and complex reach-avoid specifications.

**Comparison With Other Tools.** We considered other controller synthesis tools for possible comparison with RealSyn. In summary, CoSyMa [27], Pessoa [30], and SCOTS [31] do not explicitly support discrete-time sytems. LTLMop [22,37] is designed to analyze robotic systems in the (2-dimensional)

<sup>3</sup> The examples are available at https://github.com/umangm/realsyn.


**Table 1.** Controller synthesis using RealSyn and different SMT solvers. An explanation for the <sup>∗</sup> marked entries can be found in Sect. 4.

Euclidean plane and thus not suitable for most of our examples. TuLiP [13,39] comes closest to addressing the same class of problems. TuLip relies on discretization of the state space and a receding horizon approach for synthesizing controllers for more general GR(1) specifications. However, we found TuLip succumbs to the state space explosion problem when discretizing the state space, and it did not work on most of our examples. For instance, TuLiP was unable to synthesize a controller for the 2-dimensional system '1-robot' (Table 1), and returned unrealizable. On the benchmark '2-robot' (n = 4), TuLip did not return any answer within 1 h. We checked these findings with the developers and they concurred that it is typical for TuLip to take hours even for 4-dimensional systems.

**Benchmarks.** Our benchmarks and their SMT encodings, could be of independent interest to the verification and SMT-community. Examples 1–10 are vehicle motion planning examples we have designed with reach-avoid specifications. Benchmarks 1–2 model robots moving on the Euclidean plane, where each robot is a 2-dimensional system and admits a 1-dimensional input. Starting from some initial region on the plane, the robots are required to reach the common goal area within the given time steps, while avoiding certain obstacles. For '2-robot', the robots are also required to maintain a minimum separation. Benchmarks 3–7 are discrete vehicular models adopted from [12]. Each vehicle is a 4-dimensional system with 2-dimensional input. Benchmark 3 is the system as our running example. Benchmark 4 describes one *ego* vehicle running on a twolane road, trying to overtake a vehicle in front of it. The second vehicle serves as the obstacle. Benchmarks 5–7 are similar to Benchmark 2 where the vehicles are required to reach a common goal area while avoiding collision with the obstacles and with each other (inspired by a merge). The velocities and accelerations of the vehicles are also constrained in each of these benchmarks.

Benchmarks 8–10 model multiple vehicles trying to form a platoon by maintaining the safe relative distance between consecutive vehicles. The models are adopted (and discretized) from [32]. Each vehicle is a 2-dimensional system with 1-dimensional input. For the 4-car platoon model, the running times reported in Table 1 are much smaller than the time (5 min) reported in [32]. This observation aligns with our analysis in Sect. 3.1.

Benchmarks 11–21 are from [2]. The specification here is that the reach set has to be within a safe rectangle (that is, G = *true*). In [2] each model is discretized using 8 different time steps and here we randomly pick one for each model. In general, the running time of RealSyn is less than those reported in [2] (their reported machine had better configuration). On the other hand, the synthesized controller from [2] considers quantization errors, while our approach does not provide any guarantee for that.

Benchmarks 22–24 are a set of high dimensional examples adopted and discretized from [36]. Similar to previous ones, the only specification is that the reach sets starting from an initial state with the controller should be contained within a safe rectangle.

**Synthesis Performance.** In Table 1, columns 'n' and 'm' stand for the dimensions of the state space and input space. For each background solver, '#iter' is the number of iterations Algorithm 1 required to synthesize a controller, and 'time' is the respective running times. We specify a time limit of 1 h and report T/O (timeout) for benchmarks that do not finish within this limit. All benchmarks are synthesized for a specification with 10–20 steps.

In general, for low-dimensional systems (for example, in Benchmarks 11– 21), each of the solvers finish quickly (in less than 1 s), with CVC4 and Yices outperforming Z3 on most benchmarks. The Yices solver is faster than the other two on most examples. Z3 was the slowest on most, except a few (e.g., Benchmark 3, 6) where CVC4 was much slower. The running time, in general, increases with the increase of the dimensionality but this relationship is far from simple. For example, the 84-dimensional Benchmark 24 was synthesized in less than 9 s by both CVC4 and Yices, possibly because the safety specification is rather simple for this problem.

The three solvers use different techniques for solving QF-LRA formulae with support for incremental solving. The default tactic in Z3 is such that it spends a large chunk of time when a constraint is pushed to the solver stack. In fact, for Benchmark 24, while the other two solvers finish within 9 s, Z3 did not finish pushing the constraints in the solver stack. When we disable incremental solving in Z3, the Benchmarks 22, 23 and 24 finish in about 650, 240 and 1800 s respectively (marked with <sup>∗</sup>). The number of iterations widely vary across solvers, with CVC4 usually finishing in the fewest number of iterations. Despite the larger number of satisfiability queries, Yices manages to finish close to CVC4 on most examples.

#### **5 Conclusion**

We proposed a novel technique for synthesizing controllers for systems with discrete time linear dynamics, operating under bounded disturbances,and for reachavoid specifications. Our approach relies on generating controllers that combine an open loop-controller with a tracking controller, thereby allowing a decoupled approach for synthesizing each component independently. Experimental evaluation using our tool RealSyn demonstrates the value of the approach when analyzing systems with complex dynamics and specifications.

There are several avenues for future work. This includes synthesis of combined controllers for nonlinear dynamical and hybrid systems, and for more general temporal logic specifications. Generating witnesses to show the absence of controllers is also an interesting direction.

#### **References**


366 C. Fan et al.

**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

### **Synthesis of Asynchronous Reactive Programs from Temporal Specifications**

Suguman Bansal1(B) , Kedar S. Namjoshi2(B) , and Yaniv Sa'ar3(B)

> <sup>1</sup> Rice University, Houston, TX, USA suguman@rice.edu <sup>2</sup> Bell Labs, Nokia, Murray Hill, NJ, USA kedar.namjoshi@nokia-bell-labs.com <sup>3</sup> Bell Labs, Nokia, Kfar Saba, Israel yaniv.saar@nokia.bell-labs.com

**Abstract.** Asynchronous interactions are ubiquitous in computing systems and complicate design and programming. Automatic construction of asynchronous programs from specifications ("synthesis") could ease the difficulty, but known methods are complex, and intractable in practice. This work develops substantially simpler synthesis methods. A direct, exponentially more compact automaton construction is formulated for the reduction of asynchronous to synchronous synthesis. Experiments with a prototype implementation of the new method demonstrate feasibility. Furthermore, it is shown that for several useful classes of temporal properties, automaton-based methods can be avoided altogether and replaced with simpler Boolean constraint solving.

#### **1 Introduction**

Modern software and hardware systems harness asynchronous interactions to improve speed, responsiveness, and power consumption: delay-insensitive circuits, networks of sensors, multi-threaded programs and interacting web services are all asynchronous in nature. Various factors contribute to asynchrony, such as unpredictable transmission delays, concurrency, distributed execution, and parallelism. The common result is that each component of a system operates with partial, out-of-date knowledge of the state of the others, which considerably complicates system design and programming. Yet, it is often easier to state the desired behavior of an asynchronous program. We therefore consider the question of automatically constructing (i.e., synthesizing) a correct reactive asynchronous program directly from its temporal specification.

The *asynchronous synthesis problem* was originally formulated by Pnueli and Rosner in 1989 on the heels of their work on synchronous synthesis [31,32]. The task is that of constructing a (finite-state) program which interacts asynchronously with its environment while meeting a temporal specification on the actions at the interface between program and environment. Given a linear temporal specification ϕ, Pnueli-Rosner show that *asynchronous* synthesis can be reduced to checking whether a derived specification ϕ , specifying the required behavior of the scheduler, is *synchronously* synthesizable. That is, an asynchronous program can implement ϕ iff a synchronous program can implement ϕ .

It may then appear straightforward to construct asynchronous programs using one of the many tools that exist for synchronous synthesis. However, the derived formula ϕ embeds a nontrivial stutter quantification, which requires a complex intermediate automaton construction; it has not, to the authors' knowledge, ever been implemented. This situation is in stark contrast to that of synchronous synthesis, for which multiple tools and algorithms have been created.

Alternative methods have been proposed for asynchronous synthesis: Finkbeiner and Schewe reduce a bounded form of the problem to a SAT/SMT query [35], and Klein, Piterman and Pnueli show that some GR(1) specifications<sup>1</sup> can be transformed as above to an approximate synchronous GR(1) property [21,22]. These alternatives, however, have drawbacks of their own. The SAT/SMT reduction is exponential in the number of interface (input and output) bits, an important parameter; the GR(1) specifications amenable to transformation are limited and are characterized by semantic conditions that are not easily checked.

This work presents two key simplifications. First, we define a new property, PR(ϕ) (named in honor of Pnueli-Rosner's pioneering work) which, like ϕ , is synchronously realizable if, and only if, ϕ is asynchronously realizable. We then present an automaton construction for PR(ϕ) that is direct and simpler, and results in an exponentially smaller automaton than the one for ϕ . In particular, the automaton for PR(ϕ) has only at most *twice* the states of the automaton for ϕ, as opposed to the *exponential blowup* of the state space (in the number of interface bits) incurred in the construction of the automaton for ϕ . As almost all synchronous automaton-based synthesis tools use an explicit encoding for automaton states, this reduction is vital in practice.

We show how to implement the transformation PR symbolically (with BDDs), so that interface bits are always represented in symbolic form. One can then apply the modular strategy of Pnueli-Rosner: a symbolic automaton for ϕ is transformed to a symbolic automaton for PR(ϕ) (instead of ϕ ), which is analyzed with a synchronous synthesis tool. We establish that PR is conjunctive and preserves safety<sup>2</sup>. These are important properties, used by tools such as Acacia+ [8] and Unbeast [11] to optimize the synchronous synthesis task. The new construction has been implemented in a prototype tool, BAS, and experiments demonstrate feasibility in practice.

In addition, we establish that for several classes of temporal properties, which are easily characterized by syntax, the automaton-based method can be avoided entirely and replaced with Boolean constraint solving. The constraints are quantified Boolean formulae, with prefix ∃∀ and a kernel that is derived from the original specification. This surprising reduction, which resolves a temporal prob-

<sup>1</sup> The GR(1) ("General Reactivity (1)") subclass has an efficient symbolic procedure for synchronous synthesis, formulated in [28] and implemented in several tools.

<sup>2</sup> I.e., PR( - *i <sup>f</sup>i*) = - *i* PR(*fi*), and PR(*f*) is a safety property if *<sup>f</sup>* is a safety property.

lem with Boolean reasoning, is a consequence of the highly adversarial role of the environment in the asynchronous setting.

These contributions turn a seemingly intractable synthesis task into one that is feasible in practice.

#### **2 Preliminaries**

**Temporal Specifications.** Linear Temporal Logic (LTL) [29] extends propositional logic with temporal operators. LTL formulae are defined as ϕ:: = True <sup>|</sup> False <sup>|</sup> <sup>p</sup> | ¬<sup>ϕ</sup> <sup>|</sup> <sup>ϕ</sup><sup>1</sup> <sup>∧</sup> <sup>ϕ</sup><sup>2</sup> <sup>|</sup> Xϕ <sup>|</sup> <sup>ϕ</sup>1Uϕ<sup>2</sup><sup>|</sup> ♦<sup>ϕ</sup> <sup>|</sup> <sup>ϕ</sup> <sup>|</sup> <sup>ϕ</sup>. Here <sup>p</sup> is a proposition, and X(Next), U(Until), ♦(Eventually), (Always), and (Always in the past) are temporal operators. The LTL semantics is standard, and is in the full version of the paper. For an LTL formula ϕ, let <sup>L</sup>(ϕ) denote the set of words (over subsets of propositions) that satisfy ϕ.

GR(1) is a useful fragment of LTL, where formulae are of the form (S<sup>e</sup> <sup>∧</sup> m <sup>i</sup>=0 ♦P<sup>i</sup>) <sup>⇒</sup> (S<sup>s</sup> <sup>∧</sup> n <sup>i</sup>=0 ♦Q<sup>i</sup>), for propositional formulae <sup>S</sup><sup>e</sup>, <sup>S</sup><sup>s</sup>, <sup>P</sup><sup>i</sup>, <sup>Q</sup><sup>i</sup>. Typically, the left-hand side of the implication is used to restrict the environment, by requiring safety and liveness assumptions to hold, while the right-hand side is used to define the safety and liveness guarantees required of the system.

LTL specifications can be turned into equivalent B¨uchi automata, using standard constructions. A B¨uchi automaton, A, is specified by the tuple (Q, Q0, Σ, δ, G), where <sup>Q</sup> is a set of states, <sup>Q</sup><sup>0</sup> <sup>⊆</sup> <sup>Q</sup> defines the initial states, Σ is the alphabet, δ <sup>⊆</sup> Q <sup>×</sup> Σ <sup>×</sup> Q is the transition relation, and G <sup>⊆</sup> Q defines the "green" (also known as "accepting" or "final") states. A *run* r of the automaton on an infinite word σ <sup>=</sup> a0, a1,... over <sup>Σ</sup> is an infinite sequence <sup>r</sup> <sup>=</sup> <sup>q</sup>0, a0, q1, a1,... such that <sup>q</sup><sup>0</sup> is an initial state, and for each <sup>k</sup>, (q<sup>k</sup>, a<sup>k</sup>, q<sup>k</sup>+1) is in the transition relation. Run r is accepting if a green state appears on it infinitely often; the language of A, denoted <sup>L</sup>(A), is the set of words that have an accepting run.

**The Asynchronous Synthesis Model.** The goal of synthesis is to construct an "open" program M meeting a specification at its interface. In the asynchronous setting, the program M interacts in a fair interleaved manner with its environment E. The fairness restriction requires that E and M are each scheduled infinitely often in all infinite executions. Let E//M denote this composition. The interface between E and M is formed by the variables x and y. Variable x is written by E and is read-only for M, while y is written by M and is readonly for E. One can consider x (resp., y) to represent a vector of variables, i.e., x = (x<sup>1</sup>,...,x<sup>n</sup>) (resp., <sup>y</sup> = (y<sup>1</sup>,...,y<sup>m</sup>)) which is read (resp., written) atomically. Many of our results also extend to non-atomic reads and writes, and are discussed in the full version of the paper.

The synthesis task is to construct a program M which satisfies a temporal property ϕ(x, y) over the interface variables in the composition E//M, for *any* environment E. The most adversarial environment is the one which sets x to an arbitrary value at each scheduled step, we denote it by CHAOS(x). The behaviors of the composition CHAOS(x)//M simulate those of E//M for all E. Hence, it suffices to produce M which satisfies ϕ in the composition CHAOS(x)//M. One can limit the set of environments through an assumption in the specification.

This leads to the formal definition of an *asynchronous schedule*, given by a pair of functions, r, w : <sup>N</sup> <sup>→</sup> <sup>N</sup>, which represent read and write points, respectively. The initial write point, w(0) = 0, and represents the choice of initial value for the variable y. Without loss of generality, the read-write points alternate, i.e., for all i <sup>≥</sup> 0, w(i) <sup>≤</sup> r(i) < w(i+ 1) and r(i) < w(i+ 1) <sup>≤</sup> r(i+ 1). A *strict* asynchronous schedule does not allow read and write points to overlap, i.e., the constraints are strengthened to w(i) < r(i) < w(i+1) and r(i) < w(i+1) < r(i+1). A *tight* asynchronous schedule is the strict schedule without any non-read-write gaps, i.e., <sup>r</sup>(k)=2<sup>k</sup> + 1 and <sup>w</sup>(k)=2k, for all <sup>k</sup>. A *synchronous* schedule is the special non-strict schedule where r(i) = i and w(i) = i, for all i.

Let D<sup>v</sup> denote the binary domain {True, False} for a variable <sup>v</sup>. A program M can be represented semantically as a function f : (D<sup>x</sup>)<sup>∗</sup> <sup>→</sup> <sup>D</sup><sup>y</sup>. For an asynchronous schedule (r, w), a sequence σ = (D<sup>x</sup> <sup>×</sup> D<sup>y</sup>)<sup>ω</sup> is said to be an *asynchronous execution of* f *over* (r, w) if the value of y is changed only at writing points, in a manner that depends only on the values of x at prior reading points. Formally, for all <sup>i</sup> <sup>≥</sup> 0, <sup>y</sup><sup>w</sup>(i+1) <sup>=</sup> <sup>f</sup>(x<sup>r</sup>(0) ...x<sup>r</sup>(i)), and for all <sup>j</sup> such that w(i) <sup>≤</sup>j<w(i + 1), y<sup>j</sup>=y<sup>w</sup>(i). The initial value of <sup>y</sup> is the value it has at point w(0) = 0. The set of such sequences is denoted as asynch(f). Over synchronous schedules, the set of such sequences is denoted by synch(f). Function f is an asynchronous implementation of ϕ if all asynchronous executions of f over all possible schedules satisfy ϕ, i.e., if asynch(f) ⊆ L(ϕ).

This formulation agrees with that given by Pnueli and Rosner for strict schedules. For synchronous schedules (and other non-strict schedules), our formulation has a Moore-style semantics – the output depends on strictly earlier inputs – while Pnueli and Rosner formulate a Mealy semantics. A Moore semantics is more appropriate for modeling software programs, where the output variable is part of the state, and fits well with the theoretical constructions that follow.

**Definition 1 (Asynchronous LTL Realizability).** *Given an LTL property* ϕ(x, y) *over the input variable* x *and output variable* y*, the* asynchronous LTL realizability *problem is to determine whether there is an asynchronous implementation for* ϕ*.*

**Definition 2 (Asynchronous LTL Synthesis).** *Given a realizable LTLformula* ϕ*, the* asynchronous LTL synthesis *problem is to construct an asynchronous implementation of* ϕ*.*

**Examples.** Pnueli and Rosner give a number of interesting specifications. The specification (y <sup>≡</sup> Xx) ("the current output equals the next input") is satisfiable but not realizable, as any implementation would have to be clairvoyant. On the other hand, the flipped specification (x <sup>≡</sup> Xy) ("the next output equals the current input") is synchronously realizable by a Moore machine which replays the current input as the next output. The specification ♦ x <sup>≡</sup> ♦ y is synchronously realizable by the same machine, but is asynchronously unrealizable, as shown next. Consider two input (x) sequences, under a schedule where reads happen only at odd positions. In both, let x=true at all reading points. Then any program must respond to both inputs with the same output sequence for y. Now suppose that in the first sequence x is false at all non-read positions, while in the second, x is true at all non-read positions. In the first case, the specification forces the output y-sequence to be false infinitely often; in the second, y is forced to be true from some point on, a contradiction.

The negated specification ♦ x≡♦ y is also asynchronously unrealizable, for the same reason. This "gap" illustrates an intriguing difference from the synchronous case, where either a specification is realizable for the system, or its negation is realizable for the environment. The two halves of the equivalence, i.e., ♦ x⇒♦ y and ♦ y <sup>⇒</sup>♦ x are individually asynchronously realizable, by strategies that fix the output to y=true and to y=false, respectively.

**From Asynchronous to Synchronous Synthesis.** Pnueli and Rosner reduced asynchronous LTL synthesis to synchronous synthesis of B¨uchi objectives. Their reduction applied to LTL formulas with a single input and output variable [32]; it was later extended to the non-atomic case [30]. The original Rosner-Pnueli reduction deals exclusively with strict schedules, since they showed that it is sufficient to consider only strict schedules.

Two infinite sequences are said to be *stuttering equivalent* if one sequence can be obtained from the other by a finite duplication ("stretching") of a given state or by deletion ("compressing") of finitely many contiguous identical states retaining at least one of them. The *stuttering quantification* ∃<sup>≈</sup> is defined as follows: <sup>∃</sup><sup>≈</sup>x.ϕ holds for sequence <sup>π</sup> if <sup>∃</sup>x.ϕ holds for a sequence <sup>π</sup> that is stuttering equivalent to π. Pnueli-Rosner showed that an LTL-formula ϕ(x, y) over input x and output y is asynchronously realizable iff a "kernel" formula (this is the precise formula referred to as ϕ in the Introduction) <sup>K</sup>(r, w, x, y) = α(r, w) <sup>→</sup> β(r, w, x, y) over read sequence r, write sequence w, input sequence x and output sequence y is synchronously realizable:

$$\begin{array}{rcl}\alpha(r,w) = & \left(\neg r \land \neg w \Downarrow r\right) \land \Box \neg (r \land w) \land \Box \left(r \Rightarrow (r \Downarrow (\neg r) \Downarrow w)\right) \\ & \land \Box (w \Rightarrow (w \Downarrow (\neg w) \Downarrow r)) \\ \beta(r,w,x,y) = & \varphi(x,y) \land \forall a.\Box ((y=a) \Rightarrow ((y=a) \Downarrow (\neg w \land (y=a) \Downarrow w))) \\ & \land \forall^{\heartsuit} x'.(\Box (\neg r \Rightarrow \neg r \Downarrow (x=x')) \Rightarrow \varphi(x',y)) \end{array}$$

Here, α encodes the strict scheduling constraints on read and write points, while β encodes conditions which assure a correct asynchronous execution over (r, w). The ∀<sup>≈</sup> quantification, intuitively, quantifies over all adversarial schedules similar to the current (r, w): it requires ϕ to hold over all sequences obtained from the current sequence σ by stretching or compressing the segments between read and write points, and choosing different values for x on those segments.

#### **3 Symbolic Asynchronous Synthesis**

Pnueli and Rosner's procedure for asynchronous synthesis [32] is as follows: first, a B¨uchi automaton is built for the kernel formula ¬K. This automaton is then determinized and complemented to form a deterministic word automaton for K, which is then re-interpreted as a tree automaton and tested for non-emptiness. The transformations use standard constructions, except for the interpretation of the ∃<sup>≈</sup> operator in the formation of the B¨uchi automaton for ¬K. For a B¨uchi automaton A, an automaton for <sup>∃</sup><sup>≈</sup>L(A) is constructed in two steps: first applying a "stretching" transformation on A, followed by a "compressing" transformation. Stretching introduces new automaton states of the form (q, a), for each state q of A and each letter a.

When this general construction is applied to the formula ¬K, the alphabet of the automaton A is formed of all possible valuations of the pair of variables (x, y), which has size *exponential* in the number of interface bits. The stretching step introduces a copy of an automaton state for each letter, which results in an exponential blow-up of the state space of the constructed automaton. As all current tools for synchronous synthesis represent automaton states explicitly<sup>3</sup>, the exponential blowup introduced by the stuttering quantification is a significant obstacle to implementation.

In Pnueli-Rosner's construction, the determinization and complementation steps are also complex, utilizing Safra's construction. These steps are simplified by the "Safraless" procedure adopted in current tools for synchronous synthesis.

The other major issue with the Pnueli-Rosner construction is that the kernel formula <sup>K</sup> introduces the scheduling variables r, w as input variables. However, the actions of a synthesized program should not rely on the values of these variables. Pnueli-Rosner ensure this by checking satisfiability over "canonical" tree models; it is unclear, however, how to realize this effect using a synchronous synthesis tool as a black box.

We define a new property, PR(ϕ), that differs from <sup>K</sup> but, similarly, is synchronously realizable if, and only if, ϕ is asynchronously realizable. We then present an automaton construction for PR(ϕ) that bypasses the general construction for ∃<sup>≈</sup>, avoiding the exponential blowup and resulting in an automaton with *at most twice* the states of the original. Moreover, this construction refers only to x and y, avoiding the second issue as well. We then show that this construction can be implemented fully symbolically.

#### **3.1 Basic Formulations and Properties**

As formulated in Sect. 2, an asynchronous execution of f is determined by the schedule (r, w). For a strict schedule, any infinite sequence representing an asynchronous behavior of f over (r, w) may be partitioned into a sequence of *blocks*, as follows. The start of the i'th block is at the i'th writing point, w(i), and it

<sup>3</sup> With one exception. BoSy's DQBF procedure is fully symbolic but does not work as well as the default QBF procedure [12].

**Fig. 1.** A strict asynchronous computation for *f*. Values of *x* at non-reading points are shown as dotted. The *y*-value is constant between writing points, illustrated by a solid rectangle. Blocks are shown as dashed rectangles.

ends just before the i + 1'st writing point, w(i+1). The schedule ensures the i'th block includes the i'th reading point, r(i), associated with the input-output value (x<sup>i</sup>, y<sup>i</sup>). As the value of <sup>y</sup> changes only at writing points, <sup>y</sup><sup>i</sup> is constant in the <sup>i</sup>'th block. Thus, the i'th block follows the pattern (⊥, y<sup>i</sup>)∗(x<sup>i</sup>, y<sup>i</sup>)(⊥, y<sup>i</sup>)∗, where <sup>⊥</sup> denotes an arbitrary choice of x-value. Figure <sup>1</sup> illustrates a strict asynchronous computation and its decomposition into blocks.

*Expansions.* The set of *expansions* of sequence <sup>δ</sup> = (x0, y<sup>0</sup>)(x1, y<sup>1</sup>)... consists of all sequences obtained by simultaneously replacing each (x<sup>i</sup>, y<sup>i</sup>) in δ by a block with the pattern (⊥, y<sup>i</sup>)∗(x<sup>i</sup>, y<sup>i</sup>)(⊥, y<sup>i</sup>)∗. Formally, given sequences δ <sup>=</sup> (x0, y<sup>0</sup>)(x1, y<sup>1</sup>)... and σ = (¯x0, y¯0)(¯x1, y¯1)..., δ *expands to* σ, denoted as δ exp σ, if there exists an asynchronous schedule (ˆr, wˆ) for which σ is an execution that is a block pattern of <sup>δ</sup>, i.e., for all <sup>i</sup>, <sup>x</sup><sup>i</sup> = ¯x<sup>r</sup>ˆ(i) and <sup>y</sup><sup>i</sup> = ¯y<sup>w</sup>ˆ(i) and for all <sup>j</sup>, <sup>w</sup>ˆ(i) <sup>≤</sup> j < <sup>w</sup>ˆ(<sup>i</sup> + 1) it is the case that ¯y<sup>j</sup> = ¯y<sup>w</sup>ˆ(i). The inverse relation (read as *contracts to*) is denoted by exp <sup>−</sup><sup>1</sup>. Figure 2 shows the synchronous computation that contracts the computation shown in Fig. 1.

*Relational Operators.* For a relation R, the modal operators R and [R] are defined as follows. For any set S,

$$u \in \langle R \rangle S = (\exists v : uRv \land v \in S) \qquad \quad u \in [R]S = (\forall v : uRv \Rightarrow v \in S)$$

By definition, the operators are negation duals, i.e., <sup>¬</sup>R(¬S)=[R](S) for any R and any S. For an LTL formula ϕ and a relation R over infinite sequences, we let Rϕ abbreviate R(L(ϕ)), and similarly, let [R]ϕ abbreviate [R](L(ϕ)).

*Galois Connections.* Given partial orders (A, A) and (B, B), a pair of functions g : A <sup>→</sup> B and h : B <sup>→</sup> A form a Galois connection if, for all a <sup>∈</sup> A, b <sup>∈</sup> B: <sup>g</sup>(a) <sup>B</sup> <sup>b</sup> is equivalent to <sup>a</sup> <sup>A</sup> <sup>h</sup>(b). From the definitions, it is clear that the operators ( R<sup>−</sup><sup>1</sup>, [R]) form a Galois connection over the partial orders defined by the subset relation. I.e., for any sets S and T: R<sup>−</sup><sup>1</sup><sup>S</sup> <sup>⊆</sup> <sup>T</sup> iff, <sup>S</sup> <sup>⊆</sup> [R]T.

We first establish that the asynchronous executions of f are precisely the synchronous executions of f under an inverse expansion.

**Theorem 1.** *For an implementation* f*,* asynch(f) = exp−<sup>1</sup> synch(f)*.*

**Fig. 2.** The contracted synchronous (Moore) computation

*Proof.* (ping) Let σ be an execution in asynch(f), generated for some schedule (r, w). For any k, consider the k'th block of σ. This is the set of positions from w(k) to w(k + 1) <sup>−</sup> 1, which includes the k'th reading point r(k), say with the value (x<sup>k</sup>, y<sup>k</sup>). Then the block follows the pattern (⊥, y<sup>k</sup>)∗(x<sup>k</sup>, y<sup>k</sup>)(⊥, y<sup>k</sup>)∗. So σ is an expansion of the sequence δ = (x0, y<sup>0</sup>)(x1, y<sup>1</sup>).... By the definition of an asynchronous execution, the value <sup>y</sup><sup>k</sup>+1 <sup>=</sup> <sup>f</sup>(x0,...,x<sup>k</sup>). This is precisely the requirement for δ to be a synchronous execution of f. Hence, we have that there is a δ such that δ exp σ and δ <sup>∈</sup> synch(f). Therefore, σ <sup>∈</sup> exp−<sup>1</sup> synch(f).

(pong) Let σ be in exp−<sup>1</sup> synch(f). By definition, there is a synch(f) execution δ = (x0, y<sup>0</sup>)(x1, y<sup>1</sup>)... such that <sup>δ</sup> exp <sup>σ</sup>. As <sup>δ</sup> is a synchronous execution of <sup>f</sup>, the value <sup>y</sup><sup>k</sup>+1 <sup>=</sup> <sup>f</sup>(x0, x1,...,x<sup>k</sup>), for all <sup>k</sup>. Then <sup>σ</sup> is an asynchronous execution of f under the schedule where the k-th reading point is the point that the k'th entry, (x<sup>k</sup>, y<sup>k</sup>), from <sup>δ</sup> is mapped to in <sup>σ</sup>, and the (<sup>k</sup> + 1)-th writing point is the first point of the (k + 1)'st block in the expansion.

We now use the Galois connection to show how asynchronous synthesis can be reduced to an equivalent synchronous synthesis task. Consider a property ϕ that must hold asynchronously for an implementation f.

**Theorem 2.** *Let* f *be an implementation function, and* ϕ *a property. Then* asynch(f) ⊆ L(ϕ) *if, and only if,* synch(f) <sup>⊆</sup> [ exp ]ϕ*.*

*Proof.* From Theorem 1, asynch(f) ⊆ L(ϕ) holds iff exp−<sup>1</sup> synch(f) ⊆ L(ϕ) does. By the Galois connection, this is equivalent to synch(f) <sup>⊆</sup> [ exp ]ϕ.

#### **3.2 The Pnueli-Rosner Closure**

We refer to the property [ exp ]ϕ as the Pnueli-Rosner closure of ϕ, in honor of their pioneering work on this problem, and denote it by PR(ϕ). This has interesting mathematical properties, which are useful in practice.

**Theorem 3.** PR(ϕ)=[ exp ]ϕ *has the following properties.*


The closure property relies on the reflexivity and transitivity of exp , and that [R] is monotonic for every R. Conjunctivity follows from the conjunctivity of [R] for any R. Safety preservation is based on the Alpern-Schneider [4] formulation of safety over infinite words. Proofs are in the full version of the paper.

Conjunctivity is exploited by the tools Acacia+ [8] and Unbeast [11] to optimize the synchronous synthesis procedure. The Unbeast tool also separates out safety from non-safety sub-properties to optimize the synthesis procedure. Thus, if a specification <sup>ϕ</sup> has the form <sup>ϕ</sup><sup>1</sup> <sup>∧</sup> <sup>ϕ</sup><sup>2</sup>, where <sup>ϕ</sup><sup>1</sup> is a safety property, then PR(ϕ) = PR(ϕ1) <sup>∩</sup> PR(ϕ2) also denotes the intersection of the safety property PR(ϕ<sup>1</sup>) with another property.

#### **3.3 The Closure Automaton Construction**

By negation duality, PR(ϕ) equals <sup>¬</sup> exp (¬ϕ). We use this property to reduce asynchronous to synchronous synthesis, as follows.


The new step is the second one, which constructs B from A; the others use standard constructions and tools. This construction is as follows.


We establish that <sup>L</sup>(B) = exp L(A) through the following two lemmas.

**Lemma 1.** exp L(A) ⊆ L(B)*.*

*Proof.* Let δ = (x<sup>0</sup>, y<sup>0</sup>)(x<sup>1</sup>, y<sup>1</sup>)... be a sequence in exp L(A). By definition, there exists a sequence σ in <sup>L</sup>(A) such that δ exp σ. The expansion σ follows the pattern [(⊥, y<sup>0</sup>)∗(x<sup>0</sup>, y<sup>0</sup>)(⊥, y<sup>0</sup>)∗][(⊥, y<sup>1</sup>)∗(x<sup>1</sup>, y<sup>1</sup>)(⊥, y<sup>1</sup>)∗] ..., where [...] are used merely to indicate the boundaries of a block. An accepting run of A on σ has the form <sup>q</sup><sup>0</sup>[(⊥, y<sup>0</sup>)∗(x<sup>0</sup>, y<sup>0</sup>)(⊥, y<sup>0</sup>)∗]q<sup>1</sup>[(⊥, y<sup>1</sup>)∗(x<sup>1</sup>, y<sup>1</sup>)(⊥, y<sup>1</sup>)∗]q<sup>2</sup> ..., where the states on the run inside a block have been elided. By the definition of B, the segment <sup>q</sup><sup>0</sup>(⊥, y<sup>0</sup>)∗(x<sup>0</sup>, y<sup>0</sup>)(⊥, y<sup>0</sup>)∗q<sup>1</sup> induces a transition from <sup>q</sup><sup>0</sup> to <sup>q</sup><sup>1</sup> in <sup>B</sup> on the letter (x<sup>0</sup>, y<sup>0</sup>). Similarly, the following segment induces a transition from <sup>q</sup><sup>1</sup> to <sup>q</sup><sup>2</sup> on letter (x1, y1), and so forth. These transitions together form a run <sup>q</sup><sup>0</sup>(x0, y<sup>0</sup>)q<sup>1</sup>(x1, y<sup>1</sup>)q<sup>2</sup> ... of <sup>B</sup> on <sup>δ</sup>.

If one of the {q<sup>i</sup>} is green and appears infinitely often on the run on <sup>σ</sup>, the induced run on δ is accepting. Otherwise, as the run on σ is accepting, some green state of A occurs in the interior of infinitely many segments of that run. The transitions of B induced by those segments must be green, so the corresponding run on δ has infinitely many green edges, and is accepting for B.

**Lemma 2.** <sup>L</sup>(B) <sup>⊆</sup> exp L(A)*.*

*Proof.* Let δ be accepted by B. We show that there is σ such that δ exp σ and σ is accepted by A. Let δ have the form (x0, y<sup>0</sup>)(x1, y<sup>1</sup>)...,. Denote the accepting run of B on δ by r <sup>=</sup> q<sup>0</sup>(x0, y<sup>0</sup>)q<sup>1</sup>(x1, y<sup>1</sup>).... From the construction of <sup>B</sup>, the transition from <sup>q</sup><sup>0</sup> to <sup>q</sup><sup>1</sup> on (x0, y<sup>0</sup>) has an associated witness path through <sup>A</sup> from <sup>q</sup><sup>0</sup> to <sup>q</sup><sup>1</sup>, which follows the expansion pattern (⊥, y<sup>0</sup>)∗(x0, y<sup>0</sup>)(⊥, y<sup>0</sup>)<sup>∗</sup> on its edge labels. Stitching together the witness paths for each transition of r, we obtain both a sequence σ that is an expansion of δ and a run r of <sup>A</sup> on <sup>σ</sup>.

As r is accepting for B, it must enter infinitely often either a green state or a green edge. If it enters a green state infinitely often, that state appears infinitely often on r . If r enters a green edge infinitely often, the witness path for that edge contains a green state of A, say q; as this path is repeated infinitely often on σ, q appears infinitely often on r . In either case, a green state of A appears infinitely often on r , which is therefore, an accepting run of A on σ.

Automaton B can be placed in standard form by converting its green edges to green states as follows, forming a new automaton, Bˆ. Form a green copy of the state space, i.e., for each state q, form a green variant, G(q), which is marked as an accepting state. Set up transitions as follows. If (q, a, q ) is an original nongreen transition, then (q, a, q ) and (G(q), a, q ) are new transitions. If (q, a, q ) is an original green transition, then (q, a, G(q )) and (G(q), a, G(q )) are new transitions. This at most doubles the size of the automaton. It is straightforward to establish that <sup>L</sup>(B) = <sup>L</sup>(Bˆ).

#### **3.4 Symbolic Construction**

The symbolic construction of B<sup>ˆ</sup> closely follows the definitions above. It is easily implemented with BDDs representing predicates on the input and output variables x and y. The crucial step is to use fixpoints to formulate the existence of paths in the set Π used in the definition of B. These definitions are similar to the fixpoint definition of the CTL modality EF. We use A(q,(x, y), q ) to denote the predicate on (x, y) describing the transition from q to q in automaton <sup>A</sup>.

*Fixed Don't-Care Path.* Let EfixedY(q, y, q ) hold if there is a path of length 0 or more from q to q in A where the value of y is fixed. This is the least fixpoint (in Z) of the following implications:

– (q <sup>=</sup> q) <sup>⇒</sup> Z(q, y, q ), and – (∃x, r : A(q,(x, y), r) <sup>∧</sup> Z(r, y, q )) <sup>⇒</sup> Z(q, y, q )

The predicate A<sup>⊥</sup>(q, y, r)=(∃<sup>x</sup> : <sup>A</sup>(q,(x, y), r)) is pre-computed. Then, the least fixpoint is computed iteratively as follows.

$$\begin{aligned} \mathsf{Efixed}\mathsf{Y}^{0}(q,y,q') &= (q=q')\\ \mathsf{Efixed}\mathsf{Y}^{i+1}(q,y,q') &= \mathsf{Efixed}\mathsf{Y}^{i}(q,y,q') \vee (\exists r: A^{\perp}(q,y,r) \wedge \mathsf{Efixed}\mathsf{Y}^{i}(r,y,q')) \end{aligned}$$

Let predicate greenA(r) be true for an accepting state r of A. The predicate Efixedgreen(q, y, q ) holds if there is a fixed y-path from q to q where one of the states on it is green:

$$\mathsf{Efixedgreen}(q, y, q') = (\exists r : \mathsf{EfixedY}(q, y, r) \land \mathsf{green}\_A(r) \land \mathsf{EfixedY}(r, y, q'))$$

*Paths and Green Paths.* Let Epath(q,(x, y), q ) hold if there is a path following the block pattern (⊥, y)∗(x, y)(⊥, y)<sup>∗</sup> from q to q in A. Then,

Epath(q,(x, y), q )=(∃r, r : EfixedY(q, y, r) <sup>∧</sup> A(r,(x, y), r ) <sup>∧</sup> EfixedY(r , y, q ))

Similarly, let Egreenpath(q,(x, y), q ) hold if there is a path following the block pattern (⊥, y)∗(x, y)(⊥, y)<sup>∗</sup> from q to q in A, with an intermediate green state.

```
Egreenpath(q,(x, y), q
                        ) =
           (∃r, r : Efixedgreen(q, y, r) ∧ A(r,(x, y), r
                                                          ) ∧ EfixedY(r
                                                                         , y, q
                                                                               ))∨
           (∃r, r : EfixedY(q, y, r) ∧ A(r,(x, y), r
                                                      ) ∧ Efixedgreen(r
                                                                         , y, q
                                                                               ))
```
*State Space of* Bˆ. The state space of B<sup>ˆ</sup> is formed by pairs (q, g), where q is a state of A and g is a Boolean indicating whether it is a new green state. The accepting condition greenB<sup>ˆ</sup> (q, g) of <sup>B</sup><sup>ˆ</sup> is given by greenA(q) <sup>∨</sup> <sup>g</sup>.

*Initial States.* The initial predicate <sup>I</sup><sup>B</sup><sup>ˆ</sup> (q, g) is <sup>I</sup><sup>A</sup>(q) ∧ ¬g, where <sup>I</sup><sup>A</sup>(q) is true for initial states of the input automata A.

*Transition Relation of* Bˆ. The transition relation Bˆ((q, g),(x, y),(q , g )) is

$$(\hat{B}((q,g),(x,y),(q',g')) = \mathsf{Expath}(q,(x,y),q') \land (g' \equiv \mathsf{Exponential}(q,(x,y),q')))$$

#### **4 Implementation and Experiments**

The PR algorithm has been implemented in a framework called BAS (Bounded Asynchronous Synthesis). It uses the LTL-to-automaton converter LTL3BA [3, 6], and follows the modular method, connecting to either of two solvers, BoSy [2, 12] and Acacia+ [1,8] to solve the synchronous realizability of PR(ϕ). The PR construction is implemented in about 1200 lines of OCaml, using an external BDD library. (The core construction requires only about 400 lines of code.) For an LTL specification ϕ, the BAS workflow for asynchronous synthesis is as follows:

	- (a) Construct PR(ϕ) from A and check whether it is synchronously realizable; if so, return REALIZABLE and synthesize the implementation.
	- (b) Construct PR(¬ϕ) from A<sup>ˆ</sup> and check whether it is synchronously realizable for the environment; if so, return UNREALIZABLE.

Upon termination of any, terminate the other execution as well.

The synchronous synthesis tools successively increase a bound until a limit (computed based on automaton structure) is reached. Thus, in theory, only the check in step 3(a) is needed. However, the checks in steps 1 and 3(b) may allow the tool to terminate early (before reaching the limit bound), if a winning strategy for the environment can be discovered.

To evaluate BAS we consider the list of examples presented in Table 1. The reported experiments were performed on a VM configured to have 8 CPU cores at 2.4 GHz, 8 GB RAM, running 64-bit Linux. The running times are reported in milliseconds. For each specification (presented in the second column) we report whether it is asynchronously realizable (third column), the time for the PR construction (our contribution), and the time for checking whether the specification is realizable using BoSy and Acacia+ solvers (resp., fifth and sixth columns).

The first set of examples (Specifications 1–11) list specifications discussed in this paper and in related works. As parameterized example we consider 2 variants of arbiter specifications. The arbiter has n inputs in which clients request permissions, and n outputs in which the clients are granted permissions. In both variants of the arbiter example, no two grants are allowed to be set simultaneously. The first arbiter example (Specification 12) requires that whenever an input request <sup>r</sup><sup>i</sup> is set, the corresponding output grant <sup>g</sup><sup>i</sup> must eventually be set. The second variant (Specification 13) also requires that a grant <sup>g</sup><sup>i</sup> is set only if request <sup>r</sup><sup>i</sup> is set as well. That is, in order for a client to be granted a permission, its corresponding request must be constantly set. Since the asynchronous case cannot observe the request in between read events, this variant of the arbiter is not realizable. The results are shown for n = 2, <sup>4</sup>, 6. Note that the only comparable experimental evaluation is given in [18], where they report that asynchronous synthesis of the first arbiter example (Specification 12) takes over 8 h.

**Table 1.** BAS asynchronous synthesis runtime evaluation (times in milliseconds). We let BoSy run upto 2 h, and Acacia+ upto 1000 iterations. "Na" denotes cases where the executions did not find a winning strategy within these boundaries.


The second specification ϕ is the one discussed in Sect. 2. It is surprisingly difficult to solve. Both ϕ and its negation are asynchronously unrealizable. Moreover, ϕ is synchronously realizable. Thus, the early detection tests (steps 1 and 3(b)) failed to discover a winning strategy for the environment; the bounded synthesis tools increase the considered bound monotonically without converging to an answer in a reasonable amount of time. This example highlights the need for better tests for unrealizability. The results in the following section provide simple QBF tests of unrealizability for subclasses of LTL.

2024K Na Na

#### **5 Efficiently Solvable Subclasses of LTL**

The high complexity of direct LTL (synchronous) synthesis has encouraged the search for general procedures that work well in practice, such as Safraless and bounded synthesis [24,35]. Another useful direction has been to identify fragments of LTL with efficient synthesis algorithms [5]. Among the most noteworthy is the GR(1) subclass, for which there is an efficient, symbolic synthesis procedure ([28]). We explore this direction for *asynchronous* synthesis. Surprisingly, we show that synthesis for certain fragments of LTL can be reduced to Boolean reasoning over properties in QBF. The results cover several types of GR(1) formulae, although the question of a reduction for all of GR(1) is open.

The QBF formulae that arise have the form <sup>∃</sup>y∀x.p(x, y), where x and y are disjoint sets of variables, and p is a propositional formula over x, y. An assignment y <sup>=</sup> b for which <sup>∀</sup>x.p(x, b) holds is called a *witness* to the formula. The first such reduction is for the property ♦P.

**Theorem 4.** ϕ <sup>=</sup> ♦P *is asynchronously realizable iff* <sup>∃</sup>y∀xP *is* True*.*

*Proof.* (ping) Let b be a witness to <sup>∃</sup>y∀x.P. The function that constantly outputs y <sup>=</sup> b satisfies ϕ for any asynchronous schedule.

(pong) Let f be a candidate implementation function and suppose that <sup>∀</sup>y∃x(¬P) holds. Fix any schedule. For every value y <sup>=</sup> b that function f outputs at a writing point, there exists an input value x <sup>=</sup> a such that <sup>¬</sup>P(a, b) holds. Thus, the environment, by issuing x <sup>=</sup> a in the interval from the current writing point (with y <sup>=</sup> b) up to the next one, can ensure that <sup>¬</sup>P holds throughout the execution. Thus the specification ϕ <sup>=</sup> ♦P does not hold on this execution.

The result in Theorem 4 applies to asynchronous synthesis, but does not apply to synchronous synthesis. For example, the property ♦(x <sup>≡</sup> y) is asynchronously unrealizable, as <sup>∃</sup>y∀x(x <sup>≡</sup> y) is False. On the other hand, it is synchronously realizable with a Mealy machine that sets y to x at each point.

Theorem 4 extends easily to conjunction and disjunction of ♦ properties.

**Theorem 5.** *Specification* ϕ <sup>=</sup> <sup>m</sup> <sup>i</sup>=0 ♦P<sup>i</sup> *is asynchronously realizable iff* <sup>∃</sup>y∀x.( <sup>m</sup> <sup>i</sup>=0 <sup>P</sup><sup>i</sup>) *holds. Additionally, specification* <sup>ϕ</sup> <sup>=</sup> m <sup>i</sup>=0 ♦P<sup>i</sup> *is asynchronously realizable iff for all* <sup>i</sup> ∈ {0, <sup>1</sup> ...m}*,* <sup>∃</sup>y∀x.P<sup>i</sup> *holds.*

*Proof.* The first claim follows directly from the identity <sup>m</sup> <sup>i</sup>=0 ♦P<sup>i</sup> <sup>≡</sup> ♦( <sup>m</sup> <sup>i</sup>=0 <sup>P</sup><sup>i</sup>) and Theorem 4.

For the second, for each <sup>i</sup>, let <sup>y</sup> <sup>=</sup> <sup>b</sup><sup>i</sup> be an assignment such that <sup>∀</sup>x.P<sup>i</sup>(x, b<sup>i</sup>) holds. The function that generates sequence b0, b1,...b<sup>m</sup>, ad infinitum, is an asynchronous implementation of m <sup>i</sup>=0 ♦P<sup>i</sup>. On the other hand, suppose that for some <sup>i</sup>, <sup>∀</sup>y∃x¬P<sup>i</sup> holds, then following the construction from Theorem 4, one can define an execution where <sup>P</sup><sup>i</sup> is always False.

## **Theorem 6.** ϕ <sup>=</sup> ♦ P *is asynchronously realizable iff* <sup>∃</sup>y∀x.P *is* True*.*

The proof is similar to that for Theorem 4. Theorem 6 also extends to conjunctions and disjunctions of ♦ properties, by arguments similar to those for Theorem 5. Namely, m <sup>i</sup>=0 ♦ <sup>P</sup><sup>i</sup> is asynchronously realizable iff <sup>∃</sup>y∀x( m <sup>i</sup>=0 <sup>P</sup><sup>i</sup>) is True, and, <sup>m</sup> <sup>i</sup>=0 ♦ <sup>P</sup><sup>i</sup> is asynchronously realizable iff for some <sup>i</sup> ∈ {0, <sup>1</sup>,...m}, <sup>∃</sup>y∀x.P<sup>i</sup> is True. Theorems 4–6 apply to non-atomic reads and writes of multiple input and output variables. Proofs are in the full version of the paper.

We now consider a more general type of GR(1) formula. The *strict semantic* of GR(1) formula S<sup>e</sup> <sup>∧</sup> ♦<sup>P</sup> <sup>⇒</sup> S<sup>s</sup> <sup>∧</sup> ♦ <sup>Q</sup> is defined to be (S<sup>e</sup> <sup>⇒</sup> <sup>S</sup><sup>s</sup>) <sup>∧</sup> (S<sup>e</sup> <sup>∧</sup> ♦<sup>P</sup> <sup>⇒</sup> ♦ <sup>Q</sup>) – i.e., <sup>S</sup><sup>s</sup> is required to hold so long as <sup>S</sup><sup>e</sup> has always held in the past; and if <sup>S</sup><sup>e</sup> holds always and <sup>P</sup> holds infinitely often, then Q holds infinitely often. This is the interpretation supported by GR(1) synchronous synthesis tools.

**Theorem 7.** *The strict semantics of GR(1) specification* Se∧ ♦P <sup>⇒</sup> Ss<sup>∧</sup> ♦ <sup>Q</sup> *is asynchronously realizable iff* <sup>∃</sup>y∀x.(S<sup>e</sup> <sup>⇒</sup> (S<sup>s</sup> <sup>∧</sup> (<sup>P</sup> <sup>⇒</sup> <sup>Q</sup>))) *is* True*.*

*Proof.* (ping) If <sup>y</sup> <sup>=</sup> <sup>b</sup> is a witness to <sup>∃</sup>y∀x.(S<sup>e</sup> <sup>⇒</sup> (S<sup>s</sup> <sup>∧</sup> (<sup>P</sup> <sup>⇒</sup> <sup>Q</sup>))), let <sup>f</sup> be a function that always generates <sup>b</sup>. Suppose <sup>S</sup><sup>e</sup> holds up to point <sup>i</sup>, then as <sup>y</sup> <sup>=</sup> <sup>b</sup>, regardless of the <sup>x</sup>-value, <sup>S</sup><sup>s</sup> holds at point <sup>i</sup>. This shows that the first part of the specification holds. For the second, suppose that <sup>S</sup><sup>e</sup> holds always and <sup>P</sup> is true infinitely often. Then, by choice of y <sup>=</sup> b, (P <sup>⇒</sup> Q) holds always, thus Q holds infinitely often as well.

(pong) To prove the other side of the implication, we proceed as in Theorem 4. Let f be a candidate implementation. Fix a schedule, and suppose that <sup>∀</sup>y∃x.(S<sup>e</sup> <sup>∧</sup> (¬S<sup>s</sup> ∨ ¬(<sup>P</sup> <sup>⇒</sup> <sup>Q</sup>))) holds. Then for every step of the execution and for every value y <sup>=</sup> b that function f outputs at a writing point, there exists a value x <sup>=</sup> a which the environment can choose from that writing point to the next such that S<sup>e</sup>(a, b) is true, and one of <sup>S</sup><sup>s</sup>(a, b) or (<sup>P</sup> <sup>⇒</sup> <sup>Q</sup>)(a, b) is false at every point in that interval.

On this execution, <sup>S</sup><sup>e</sup> holds throughout. If <sup>S</sup><sup>s</sup> is false at some point, this violates the first part of the specification. If not, then (P <sup>⇒</sup> Q) must be false everywhere; i.e., at every point <sup>P</sup> is true but <sup>Q</sup> is false. Thus, <sup>S</sup><sup>e</sup> holds everywhere and P holds infinitely often but Q does not hold infinitely often, violating the second part of the specification.

Theorem 7 applies to atomic reads and writes, showing that asynchronous synthesis of GR(1) specification can be reduced to Boolean reasoning over properties in QBF. For non-atomic reads and writes, safety in asynchronous systems is more nuanced, since there is a delay between the write points of the first and last outputs in each round. This is discussed in the full version of the paper. This proof strategy does not generalize easily to the full GR(1) format, where more than one ♦ property can appear on either side of the implication.

These results establish that the asynchronous synthesis problem for such specifications is easily solvable–more easily than in the synchronous setting, surprisingly avoiding entirely the need for automaton constructions and bounded synthesis. From another, equally valuable, point of view, the results show that such types of specifications may be of limited interest for automated synthesis, as solvable cases have very simple solutions.

#### **6 Conclusions and Related Work**

This work tackles the task of asynchronous synthesis from temporal specifications. The main results are a new symbolic automaton construction for general temporal properties, and the reduction of the synthesis question for several classes of specifications to QBF. These are mathematically interesting, being substantial simplifications of prior methods. Moreover, they make it feasible to implement an asynchronous synthesis tool following the modular process suggested by Pnueli and Rosner in 1989, by reducing asynchronous synthesis to a synchronous synthesis question. To the best of our knowledge, this is the first such tool. The prototype, which builds on tools for synchronous synthesis, is able to quickly synthesize asynchronous programs for several interesting properties. There are, undoubtedly, several challenges, one of which is the quick detection of unrealizable specifications.

Our work builds upon several earlier results, which we discuss here. The synthesis question for temporal properties originates from a question posed by Church in the 1950s (see [37]). The problem of synthesizing a synchronous reactive system from a linear temporal specification was formulated and studied by Pnueli and Rosner [31], who gave a solution based on non-emptiness of tree automata. There has been much progress on the synchronous synthesis question since. Key developments include the discovery of efficient symbolic (BDD-based) solutions for the GR(1) class [7,28], the invention of "Safraless" procedures [24], the application of these ideas for bounded synthesis [15,35], and their implementation in a number of tools, e.g. [8,10,11,13,20,34]. These have been applied in many settings (cf. [9,23,25–27]).

The problem of synthesizing asynchronous programs was also formulated and studied by Pnueli and Rosner [32] but has proved to be much more challenging, with only limited progress. The original Pnueli-Rosner constructions are complex and were not implemented. Work by Klein, Piterman and Pnueli, nearly 20 years later [22], shows tractability for some GR(1) specifications. However, the class of specifications that can be so handled is characterized by semantic constraints such as stuttering-closure and memoryless-ness, which are difficult to recognize.

Finkbeiner and Schewe [18,35] present an alternative method, based on bounded synthesis, that applies to all LTL properties: it encodes the existence of a deductive proof for a bounded program into SAT/SMT constraints. However, the encoding represents inputs and outputs explicitly and is, therefore, exponential in the number of input and output bits. The exponential blowup has practical consequences: an asynchronous arbiter specification requires over 8 h to synthesize [18]; the same specification can be synthesized by our method in seconds. (Note, however, that the method in [18] is not specialized to asynchronous synthesis, and this difference may not be solely due to the explicit state representation, as the specification has only 4 bits.) Recent work gives an alternative encoding of synchronous bounded synthesis into QBF constraints, retaining input and output bits in symbolic form [12]. We believe that a similar encoding applies to asynchronous bounded synthesis as well, this is a topic for future work.

Pnueli and Rosner's model of interface communication is not the only choice. Other models for asynchrony could, for instance, be based on CCS/CSP-style rendezvous communication at the interface, or permit shared read-write variables with atomic lock/unlock actions. Petri net game models have also been suggested for distributed synthesis [16]. An orthogonal direction is to weaken the adversarial power of the environment through a probabilistic model which can be used to constrain unlikely, highly adversarial input patterns to have probability 0, thus turning the synthesis problem into one where programs satisfy their specifications with high probability. (The synthesis of multiple processes is known to be undecidable in most cases [17,33].)

In the broader context of fully automatic program synthesis, there are various approaches to the synthesis of single-threaded, terminating programs from formal pre- and post-condition specifications and from examples, using type information and other techniques to prune the search space. (We will not attempt to survey this large field, some examples are [14,19,36].) An intriguing question is to investigate how the techniques developed in these distinct lines of work can be fruitfully combined to aid the development of asynchronous, reactive software.

**Acknowledgements.** Kedar Namjoshi was supported, in part, by NSF grant CCF-1563393 from the National Science Foundation. We would like to thank Michael Emmi for many helpful discussions during the early stages of this work.

#### **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

### **Syntax-Guided Synthesis with Quantitative Syntactic Objectives**

Qinheping Hu(B) and Loris D'Antoni

University of Wisconsin-Madison, Madison, USA {qhu28,loris}@cs.wisc.edu

**Abstract.** Automatic program synthesis promises to increase the productivity of programmers and end-users of computing devices by automating tedious and error-prone tasks. Despite the practical successes of program synthesis, we still do not have systematic frameworks to synthesize programs that are "good" according to certain metrics e.g., produce programs of reasonable sizes or with good runtime—and to understand when synthesis can result in such good programs. In this paper, we propose QSyGuS, a unifying framework for describing syntaxguided synthesis problems with quantitative objectives over the syntax of the synthesized programs. QSyGuS builds on weighted (tree) grammars, a clean and foundational formalism that provides flexible support for different quantitative objectives, useful closure properties, and practical decision procedures. We then present an algorithm for solving QSyGuS. Our algorithm leverages closure properties of weighted grammars to generate intermediate problems that can be solved using non-quantitative SyGuS solvers. Finally, we implement our algorithm in a tool, QuaSi, and evaluate it on 26 quantitative extensions of existing SyGuS benchmarks. QuaSi can synthesize optimal solutions in 15/26 benchmarks with times comparable to those needed to find an arbitrary solution.

#### **1 Introduction**

The goal of program synthesis is to find a program in some search space that meets a specification—e.g., a set of examples or a logical formula. Recently, a large family of synthesis problems has been unified into a framework called syntax-guided synthesis (SyGuS). A SyGuS problem is specified by a contextfree grammar describing the search space of programs, and a logical formula describing the specification. Many synthesizers now support this format [2] and annually compete in synthesis competitions [4]. Thanks to these competitions, these solvers are now quite mature and are finding wide application [14].

While the logical specification mechanism provided by SyGuS is powerful, it can only capture the functional requirements of the synthesis problem—e.g., the program should perform correctly on a given set of input/output examples. When multiple possible programs can satisfy the specification, SyGuS *does not* provide a way to prefer one to the other—e.g., one cannot ask a solver to return the program with the fewest if-statements. As a consequence, existing synthesis

c The Author(s) 2018

tools do not provide guarantees about what solution is returned if multiple ones exist. While a few synthesizers have attempted to include some form of specification to express this kind of quantitative intents [7,15,16,19], these approaches are domain-specific, do not apply to SyGuS problems, and do not provide a simple and flexible specification mechanism. The lack of a formal treatment of quantitative requirements stands in the way of designing synthesizers that can take advantage of quantitative objectives to perform more efficient forms of synthesis.

In this paper, we propose QSyGuS, a unifying framework for describing syntax-guided synthesis problems with quantitative objectives over the syntax of the synthesized programs—e.g., find the most likely program with respect to a given probability distribution—and present an algorithm for solving synthesis problems expressed in this framework. We focus on syntactic objectives because they are the most common ones in practical applications of program synthesis. For example, in programming by examples it is desirable to produce small programs with fewer constants because these programs are more likely to generalize to examples outside of the specification [13]. QSyGuS extends SyGuS in two ways. First, in QSyGuS the search space is represented using weighted grammars, which augment context-free grammars with the ability to assign weights to programs. Second, QSyGuS allows the user to specify constraints over the weight of the program, including optimization objectives—e.g., find the program with the fewest if-statements and with the lowest depth.

QSyGuS is a natural, general, and flexible formalism and is grounded in the well-studied theory of weighted grammars. We leverage this theory and design an algorithm for solving QSyGuS problems using closure properties of weighted grammars. Given a QSyGuS problem, our algorithm generates a SyGuS problem that can be delegated to existing SyGuS solvers. The algorithm then iteratively refines the solution returned by the SyGuS solver to find an optimal one by further generating new SyGuS instances using weighted grammar operations. We implement our algorithm in a tool, QuaSi, and evaluate it on 26 quantitative extensions of existing SyGuS benchmarks. QuaSi can synthesize optimal solutions in 15/26 benchmarks with times comparable to those needed to find a solution that does not need to satisfy any quantitative objective.

**Contributions.** In summary, our contributions are:


**Fig. 1.** Weighted grammar that assigns weight (w1, w2) ∈ Nat × Nat to a program where w<sup>1</sup> is the number of if-statements and w<sup>2</sup> is the number of plus-statements.

#### **2 Illustrative Example**

In this section, we illustrate the main components of our framework using an example. We start with a Syntax-Guided Synthesis (SyGuS) problem in which no quantitative objective is provided. We recall that the goal of a SyGuS problem is to synthesize a function f of a given type that is accepted by a context-free grammar G, and such that ∀x.φ(f,x) holds (for a given Boolean constraint φ).

The following SyGuS problem asks to synthesize a function that is accepted by the following grammar and that computes the max of two numbers.

```
Start ::=Start + Start | if(BExpr) then Start else Start | x | y | 0 | 1
BExpr ::=Start > Start | ¬BExpr | BExpr ∧ BExpr
```
The semantic constraint is given by the following formula.

$$
\psi(f) \stackrel{\text{dot}}{=} \forall x, y. f(x, y) \ge x \land f(x, y) \ge y \land (f(x, y) = x \lor f(x, y) = y)
$$

The following three programs are semantically equivalent, but syntactically different solutions.

 $\max\_{x}(x,y) = \text{if}(x>y)$  then  $x$  else  $y$ 
 $\max\_{x}(x,y) = \text{if}(x>y)$ 
 $\text{then } (x+0) \text{ else } (y+0)$ 
 $\max\_{x}(x,y) = \text{if}(x>y)$ 
 $\text{then } x \text{ else } (\text{if}(y>x) \text{ then } y \text{ else } x)$ 

All solutions are correct, but the user might, for example, prefer the smallest one. However, SyGuS does not provide ways to specify this quantitative intent. *Adding Weights.* In our formalism, QSyGuS, we augment context-free grammars to assign weights to programs in the search space. Concretely, we adopt weighted grammars [10], a well-studied formalism with many desirable properties. In a weighted grammar, each production is assigned a weight. For example, the weighted grammar shown in Fig. 1 extends the one from the previous SyGuS example to assign to each program p a pair of weights (w1, w2) where w<sup>1</sup> is the number of if-statements and w<sup>2</sup> is the number of plus operators in p. In this case, the weights are pairs of integers and the weight of a grammar derivation is the pairwise sum of all the weights of the productions involved in the derivation e.g., the sum of (w1, w2) and (w 1, w <sup>2</sup>) is (w<sup>1</sup> + w <sup>1</sup>, w<sup>2</sup> + w <sup>2</sup>). In the figure, we write /(w1, w2) to assign weight (w 1, w <sup>2</sup>) to a production. We omit the weight for productions with cost (0, 0). The functions max1, max<sup>2</sup> and max<sup>3</sup> have weights (1, 0), (1, 2), and (2, 0) respectively.

*Adding and Solving Quantitative Objectives.* Once we have a way to assign weights to programs, QSyGuS allows the user to specify quantitative objectives over the weights of the productions—e.g., only allow solutions with fewer than 4 if-statements. In our example, we could require the solution to be minimal with respect to the number of if-statements, i.e., minimize the first component of the paired weight. With these constraints both max<sup>1</sup> and max<sup>2</sup> would be considered optimal solutions because there exists no solution with 0 if-statements. If we require the solution to also be minimal with respect to the second component of the paired weight, max<sup>1</sup> will be *a possible* optimal solution.

Our tool QuaSi can automatically discover solutions in both these cases. Let's consider the last minimization objective. In this case, QuaSi first uses existing SyGuS solvers to synthesize an initial solution using the non-weighted version of the grammar. Let's say that the returned solution is, for example, max<sup>3</sup> of weight (2, 0). QuaSi uses this solution to build a new SyGuS instance that only accepts programs with at most one if-statement. Solving this SyGuS problem can, for example, result in the program max<sup>2</sup> of weight (1, 2), which will require our solver to build yet another SyGuS instance. This approach is repeated and if it terminates, an optimal program is found.

#### **3 SyGuS with Quantitative Objectives**

In this section, we introduce our framework for defining syntax-guided synthesis problems with quantitative objectives over the syntax of the synthesized programs. We first provide preliminary definitions for notions such as semirings (Sect. 3.1) and weighted tree grammars (Sect. 3.2), and then use these notions to augment SyGuS problems with quantitative objectives (Sect. 3.3).

#### **3.1 Weights over Semirings**

We now define the universe of weights we will assign to programs. In general, weights are defined using monoids—i.e., sets equipped with an addition operator—but when a grammar is nondeterministic—i.e., it can produce the same term using multiple derivations—the same term might be assigned multiple weights. Hence, we choose to use semirings. Since we also care about optimization objectives, we assume all our semirings are equipped with a partial order.

**Definition 1 (Semiring).** *A (ordered)* semiring *is a pair* (*S*, ) *where (* i*) S* = (S, ⊕, ⊗, 0, 1) *is an algebra consisting of a commutative monoid* (S, ⊕, 0) *and a monoid* (S, ⊗, 1) *such that* ⊗ *distributes over* ⊕*,* 0 = 1*, and, for every* x ∈ S*,* x ⊗ 0=0*, (* ii*)* ⊂ S × S *is a partial order over* S*.*

We often use the word semiring to refer to just the algebra **S**.

*Example 1.* In this paper, we focus on semirings with the following algebras.

**Boolean** Bool = (B,∨,∧, <sup>0</sup>, 1). This semiring only contains the values true and f alse and is used to represent non-quantitative problems.

**Tropical** Trop = (<sup>Z</sup> ∪ {∞}, min, <sup>+</sup>,∞, 0). This semiring is the most common one and is used to assign additive weights—e.g., term sizes and term depth. In this case, we typically consider the order def = ≤.

**Probabilistic** Prob = ([0, 1], +, ·, 0, 1). This semiring is used to assign probabilities to terms in a grammar.

In our framework, we allow synthesis problems to have multiple objectives. Hence, we define a product operation to compose semirings. Intuitively, the following operation composes algebras of semirings to create a pair and applies the operation of each algebra to the corresponding projections of the pair. Similarly, two orders can be composed to create an order over pairs of elements. We propose two such compositions, one which assigns equal weights to the two orders and one which prefers one order over the other (Sorted).

**Definition 2 (Products).** *Given two algebras S*<sup>1</sup> = (S1, ⊕1, ⊗1, 01, 11) *and S*<sup>2</sup> = (S2, ⊕2, ⊗2, 02, 12)*, the* product algebra *is the tuple S*<sup>1</sup> ×*<sup>S</sup> S*<sup>2</sup> = (S<sup>1</sup> × S2, ⊕, ⊗,(01, 02),(11, 12)) *such that for every* x1, x<sup>2</sup> ∈ S<sup>1</sup> *and* y1, y<sup>2</sup> ∈ S2*, we have* (x1, y1) <sup>⊕</sup> (x2, y2) def = (x<sup>1</sup> <sup>⊕</sup><sup>1</sup> <sup>x</sup>2, y<sup>1</sup> <sup>⊕</sup><sup>2</sup> <sup>y</sup>2) *and* (x1, y1) <sup>⊗</sup> (x2, y2) def = (x<sup>1</sup> <sup>⊗</sup><sup>1</sup> x2, y<sup>1</sup> ⊗<sup>2</sup> y2)*.*

*Given two partial orders* 1⊂ S1×S<sup>1</sup> *and* 2⊂ S2×S2*, the* Pareto product *of the two orders is defined as the partial order* <sup>p</sup><sup>=</sup> par(1, 2) <sup>⊆</sup> (S1×S2)×(S1<sup>×</sup> S2) *such that, for every* x1, x<sup>2</sup> ∈ S<sup>1</sup> *and* y1, y<sup>2</sup> ∈ S2*, we have* (x1, y1) <sup>p</sup> (x2, y2) *iff* x<sup>1</sup> <sup>1</sup> x<sup>2</sup> *and* y<sup>1</sup> <sup>2</sup> y2*.*

*Given two partial orders* 1⊂ S<sup>1</sup> × S<sup>1</sup> *and* 2⊂ S<sup>2</sup> × S2*, the* Sorted product *of the two orders is defined as the partial order* <sup>s</sup><sup>=</sup> sort(1, 2) <sup>⊆</sup> (S<sup>1</sup> <sup>×</sup> S2) × (S<sup>1</sup> × S2) *such that, for every* x1, x<sup>2</sup> ∈ S<sup>1</sup> *and* y1, y<sup>2</sup> ∈ S2*, we have* (x1, y1) <sup>s</sup> (x2, y2) *iff* x<sup>1</sup> <sup>1</sup> x<sup>2</sup> *or (*x<sup>1</sup> = x<sup>2</sup> *and* y<sup>1</sup> <sup>2</sup> y2*).*

*Example 2.* The weights in the grammar in Fig. 1 are from the product semiring Trop×**S**Trop. When using the Pareto partial orders, we have, for example, (1, 0) (2, 0) and (1, 0) (1, 2), but (1, 2) is incomparable to (2, 0). When using the Sorted product, we have, for example, (1, 0) (1, 2) (2, 0).

#### **3.2 Weighted Tree Grammars**

Since SyGuS defines search spaces using context-free grammars, we propose to extend this formalism with weights to assign costs to terms in the grammar. We focus our attention on a restricted class of context-free grammars called regular tree grammars—i.e., grammars generating regular tree languages—because, to our knowledge, the benchmarks appearing in the SyGuS competition [3] and in practical applications of SyGuS operate over tree grammars. Moreover, it was recently shown that SyGuS problems that are undecidable for context-free grammars become decidable with weighted tree grammars [8].

*Trees* A *ranked alphabet* is a tuple (Σ, rkΣ) where Σ is a finite set of symbol and rk<sup>Σ</sup> : <sup>Σ</sup> <sup>→</sup> <sup>N</sup> associates a rank to each symbol. For every <sup>m</sup> <sup>≥</sup> 0, the set of all symbols in Σ with rank m is denoted by Σ(m) . In our examples, a ranked alphabet is specified by showing the set Σ and attaching the respective rank to every symbol as superscript—e.g., <sup>Σ</sup> <sup>=</sup> {+(2), c(0)}. We use <sup>T</sup><sup>Σ</sup> to denote the set of all (ranked) trees over <sup>Σ</sup>—i.e., <sup>T</sup><sup>Σ</sup> is the smallest set such that (*i*) <sup>Σ</sup>(0) <sup>⊆</sup> <sup>T</sup>Σ, (*ii*) if <sup>σ</sup> <sup>∈</sup> <sup>Σ</sup>(k) and <sup>t</sup>1,...,t<sup>k</sup> <sup>∈</sup> <sup>T</sup>Σ, then <sup>σ</sup>(t1, ··· , tk) <sup>∈</sup> <sup>T</sup>Σ. In the following we assume a fixed ranked alphabet (Σ, rkΣ).

*Weighted Tree Grammars.* Tree grammars are similar to word grammars but they generate ranked trees instead of words. Weighted tree grammars augment tree grammars by assigning weights from a semiring to trees. They do so by associating weights to productions in the grammar. Weighted grammars can, for example, compute the height of a tree, the number of occurrences of some node in the tree, or the probability of a tree with respect to some distribution In the following, we assume a fixed semiring (**S**, ) where **S** = (S, ⊕, ⊗, 0, 1).

**Definition 3 (Weighted Tree Grammar).** *A* weighted tree grammar *(WTG) is a tuple* G = (N, Z, P, μ)*, where* N *is a set of non-terminal symbols with arity 0,* Z *is an axiom with* Z ∈ N*,* P *is a set of production rules of the form* A → β *where* A ∈ N *is a non-terminal and* β *is a tree of* T(Σ ∪ N)*, and* μ : P → S *is a function assigning to each production a weight from the semiring.*

We can now define the semantics of a WTG as a function <sup>w</sup><sup>G</sup> : <sup>T</sup><sup>Σ</sup> → <sup>S</sup>, which assigns weights to trees. Intuitively, the weight of a tree is ⊕-sum of the weight of every possible derivation of that tree in a grammar and the weight of a derivation is the ⊗-product of the weights of the productions appearing in the derivation. We use MS(β) = X1,...,X<sup>k</sup> to denote the multi-set of all nonterminals appearing in β and β[t1/X1,...,tk/Xk] to denote the result of simultaneously substituting each X<sup>i</sup> with t<sup>i</sup> in β. Given a derivation p = A → β such that MS(β) = X1,...,X<sup>k</sup>, we assume that p is a symbol of arity k. A derivation d starting at non-terminal X is a tree of productions d ∈ T(P) representing one possible way to derive a tree starting from X. The derivation has to be such that: (*i*) the root of d is a production of the form X → β, (*ii*) for every node p = A → β in d, if MS(β) = X1,...,X<sup>k</sup>, then, for every 1 ≤ i ≤ k, the i-th child of p is a production X<sup>i</sup> → βi. Given a derivation d with root p = X → β, such that MS(β) = X1,...,X<sup>k</sup> and p has children subtrees d1,...,dk, the tree generated by d is recursively defined as tree(d) = β[tree(d1)/X1, . . . , tree(dk)/Xk]. We use der(X, t) to denote the set of all derivations d starting at X, such that tree(d) = <sup>t</sup>. The weight dw(d) of a derivation <sup>d</sup> is the <sup>⊗</sup>-product of the weights of the productions appearing in the derivation. Finally, the weight of a tree t is the ⊕-sum of the weights of all the derivations of t from the initial nonterminal wG(t) = - <sup>d</sup>∈der(Z,t) dw(d). A weighted tree grammar is *unambiguous* iff, for every <sup>t</sup> <sup>∈</sup> <sup>T</sup>Σ, there exists at most one derivation—i.e., <sup>|</sup>der(Z, t)| ≤ 1.

Weighted tree grammars generalize weighted tree automata. In particular, a *weighted tree automaton* (WTA) is a WTG in which every production is of the form <sup>A</sup> <sup>→</sup> <sup>σ</sup>(T1,...,Tn), where <sup>A</sup> <sup>∈</sup> <sup>N</sup>, each <sup>T</sup><sup>i</sup> <sup>∈</sup> <sup>N</sup>, and <sup>σ</sup> <sup>∈</sup> <sup>Σ</sup>(n) . Finally, a *tree automaton* (TA) is a WTA over the Boolean semiring—i.e., the TA accepts all trees with some derivations yielding true. Similarly, a *tree grammar* (TG) is a WTG over the Boolean semiring. Given a TA (resp. TG) G, we use L(G) to denote the set of trees accepted by <sup>G</sup>—i.e., <sup>L</sup>(G) = {<sup>t</sup> <sup>|</sup> <sup>w</sup>G(t) = true}.

*Example 3.* The weighted grammar in Fig. 1 operates over the semiring Trop × Trop, N = {Start, BExpr}, Z = Start, P contains 9 productions, and μ assigns non-zero weights to two of them.

Aside from being a natural formalism for assigning weights to trees, TGs and WTGs enjoy properties that make them a good choice for our model. First, WTGs (resp. TGs) are equi-expressive to WTAs (resp. TAs) and have logic characterizations [9–11]. Due to this reason, tree grammars are closed under Boolean operations and enjoy decidable equivalence [9]. Second, WTGs enjoy many closure and decidability properties—e.g., given two WTGs G<sup>1</sup> and G2, we can compute the grammars G<sup>1</sup> ⊕ G<sup>2</sup> and G<sup>1</sup> ⊗ G<sup>2</sup> such that, for every f, <sup>w</sup><sup>G</sup>1⊕G<sup>2</sup> (f) = <sup>w</sup><sup>G</sup><sup>1</sup> (f) <sup>⊕</sup> <sup>w</sup><sup>G</sup><sup>2</sup> (f) and <sup>w</sup><sup>G</sup>1⊗G<sup>2</sup> (f) = <sup>w</sup><sup>G</sup><sup>1</sup> (f) <sup>⊗</sup> <sup>w</sup><sup>G</sup><sup>2</sup> (f). This operation is convenient for building grammars over product semirings.

#### **3.3 QSyGuS**

In this section, we formally define QSyGuS, which extends SyGuS with quantitative objectives. In SyGuS a problem is specified with respect to a background theory T—e.g., linear arithmetic—and the goal is to synthesize a function f that satisfies two constraints provided by the user. The first constraint describes a *functional semantic property* that f should satisfy and is given as a predicate ψ(f) def = ∀x.φ(f,x). The second constraint limits the *search space* S of f and is given as a set of expressions specified by a context-free grammar G defining a subset of all the terms in T. A solution to the SyGuS problem is an expression e in S such that the formula ψ(e) is valid.

We augment such a framework in two ways. First, we replace context free grammars with WTGs, which we use to assign weights (from a given semiring) to terms. Second, we augment the problem formulation with constraints over the weight of the synthesized program—i.e., only consider programs of weight greater than 2—and optimization objectives over the same weight—i.e., find the solution of minimal weight. Weight constraints range over the grammar

$$WC := WC \land WC \mid WC \lor WC \mid \neg WC \mid w \preceq s \mid s \preceq w \mid w \prec s \mid s \prec w,$$

where w is a special variable and s is an element of the semiring under consideration. Given a constraint ω ∈ W C, we write ω(t) to denote the term obtained by replacing w with t in ω.

**Definition 4 (**QSyGuS**).** *<sup>A</sup>* QSyGuS *problem is a tuple* (T,(*S*, ), ψ(f), G, ω, opt) *where:*




*A solution to the* QSyGuS *problem is a term* <sup>e</sup> *such that* <sup>e</sup> <sup>∈</sup> <sup>L</sup>(G)*,* <sup>ψ</sup>(e) *is true, and* ω(wG(e)) *is true. If* opt *is true, we also require that there is no* g *that satisfies the previous conditions and such that* <sup>ω</sup>(wG(g)) <sup>≺</sup> <sup>ω</sup>(wG(e))*.*

<sup>A</sup> SyGuS problem is a QSyGuS problem without weight constraints—i.e., <sup>ω</sup> <sup>≡</sup> true and opt = f alse. We denote such problems just as triples (T,ψ(f), G).

*Example 4.* Consider the QSyGuS problem described in Sect. 2. We already described all the components but ω and opt in the rest of this section. In this example, ω = true and opt = true because we want to synthesize the solution with minimal weight.

#### **4 Solving QSyGuS Problems via Grammar Reduction**

In this section, we present an algorithm for solving QSyGuS problems (Algorithm 1), which works as follows. First, given a QSyGuS problem, we construct (under certain assumptions) a SyGuS problem for which the solution is guaranteed to satisfy the weight constraints ω (line 2) and use existing SyGuS solvers to find a solution to such a problem (line 3). If the QSyGuS problem requires minimization, our algorithm produces a new SyGuS instance to search for a solution that is better than the previously found one and tries to solve it (lines 6-7). This procedure is repeated until an optimal solution is found (line 8).

#### **4.1 From QSyGuS to SyGuS**

The first step of our algorithm is to construct a SyGuS problem characterizing exactly all the solutions of the QSyGuS problem that satisfy the weight constraints. Given a QSyGuS problem <sup>P</sup> = (T,(**S**, ), ψ(f), G, ω, opt), we construct a SyGuS problem P = (T,ψ(f), G ) such that a function g is a solution to the SyGuS problem <sup>P</sup> iff <sup>g</sup> is a solution of <sup>P</sup> = (T,(**S**, ), ψ(f), G, ω, f alse), where the optimization constraint has been dropped. We denote the grammar reduction operation as <sup>G</sup> <sup>←</sup> ReduceGrammar(G, ω).

*Base case*. First we show how to solve the problem when ω is an atomic formula i.e. of the form w s, s w, w ≺ s, or s ≺ w. We start by showing how to solve the problem for w s as the construction is identical for the other constraints.

Concretely, we are given a WTG G = (N, Z, P, μ) and we want to construct a TG G<sup>s</sup> = (N , Z , P ) such that <sup>t</sup> <sup>∈</sup> <sup>L</sup>(G<sup>s</sup>) iff <sup>w</sup>G(t) <sup>s</sup>. In general, it is not possible to perform this construction for arbitrary semirings and grammars. We first present our algorithm and then describe sufficient conditions under which we can ensure termination and correctness.

The idea behind our construction is to introduce new nonterminals in the grammar G<sup>s</sup> to keep track of the weight of the trees that can be produced from those nonterminals. For example, a nonterminal pair (X, s ) will derive all trees derivable from X using a single derivation of weight s . Therefore, the set of nonterminals N is a subset of N × S (plus an initial nonterminal Z ), where S is the universe of the WTG's semiring. We construct our set of nonterminals N starting from the leaf productions of G and then recursively explore other productions. At the same time we generate the set of productions P . Formally, N and P are the smallest sets such that the following conditions hold.


*Example 5.* We illustrate our construction using the grammar in Fig. 1 . Assume the weight constraint is w (1, 0) and the partial order is built using a Pareto product—i.e., we accept terms with 1 or less if-statements and no plusstatements. Our construction yields the following grammar.

$$\begin{array}{l} \text{Z}^{\cdot} ::= (\text{Start}, 1, 0) \mid (\text{Start}, 0, 0) \\ (\text{Start}, 1, 0) ::= \text{if}((\text{BERT}p, 0, 0)) \text{ then } (\text{Start}, 0, 0) \text{ else } (\text{Start}, 0, 0) \mid x \mid y \mid 0 \mid 1 \\ (\text{Start}, 0, 0) ::= x \mid y \mid 0 \mid 1 \\ (\text{BERT}p, 0, 0) ::= (\text{Start}, 0, 0) \mid \neg(\text{Start}, 0, 0) \mid (\text{BERT}p, 0, 0) \wedge (\text{BERT}p, 0, 0) \end{array}$$

The construction of G<sup>s</sup> only terminates for certain semirings and grammars, and only guarantees that individual derivations yield the correct weight—i.e., it does not account for the ⊕-sum of multiple derivations.

*Example 6.* The following WTG over Prob is ambiguous and, if we apply the grammar reduction algorithm for ω := w 0.6, the resulting grammar will be empty. However, the tree 1 + 1 has weight 0.9 0.6 (0.9 ≥ 0.6).

$$\begin{array}{c c c c} \text{Start} ::= \text{Start} + \text{Start}/0.5 & \text{Expr} ::= \text{Expr} + \text{Expr}/0.4\\ & |x \mid 0 \mid 1 \mid \text{Expr} & & |x \mid 0 \mid 1 & \square \end{array}$$

We now identify sufficient conditions under which the construction of G<sup>s</sup> terminates and is sound. In particular, we start by restricting our attention to unambiguous WTGs, which are the common ones in practice. We use weights(G) = {<sup>s</sup> <sup>|</sup> <sup>p</sup> <sup>∈</sup> <sup>P</sup> <sup>∧</sup>μ(p) = <sup>s</sup>} to denote the set of weights used by <sup>G</sup> and M**S**,G = (S , <sup>⊗</sup>, 1) to denote the submonoid of **<sup>S</sup>** generated by weights(G)—i.e., the set of all weights we can generate using <sup>⊗</sup> and weights(G).

**Theorem 1.** *Given an unambiguous WTG* G *over a semiring S such that* M*S*,G = (S , ⊗, 1)*, and a weight* s ∈ S*, the construction of* G<sup>s</sup> *terminates if the set* {s | s s ∧ w ∈ S } *is finite. Moreover, if the set of weights* weights(G) *is monotonically increasing with respect to —i.e. for every* <sup>s</sup> <sup>∈</sup> <sup>S</sup> *and* <sup>s</sup> <sup>∈</sup> weights(G)*,* <sup>s</sup> <sup>s</sup> <sup>⊗</sup> <sup>s</sup> *—then* L(G<sup>s</sup>) *contains exactly every tree* t *such that* <sup>w</sup>G(t) <sup>s</sup>*.*

The theorem above also holds for other atomic constraints w ≺ s, s w, or s ≺ w (for these last two, the direction of the monotonicity is reversed). Moreover, in certain cases, even if the construction may not terminate for, let's say s w, it might terminate for the negated constraint w ≺ s. In such a case, we can use the closure properties of regular tree grammars/automata to construct the reduced grammar for <sup>s</sup> <sup>w</sup> as <sup>G</sup><sup>w</sup> <sup>=</sup> intersect(G, complement(G<sup>w</sup>)). The same idea can be applied to all atomic constraints.

In practice, the restriction of Theorem 1 holds for grammars that operate over the Boolean and probabilistic semirings, and the tropical semiring only with positive weights. Theorem 1 never holds when **S** is the tropical semiring and the grammar contains negative weights. In general, one cannot construct the constrained grammar in this case. However, it is easy to modify our algorithm to work with grammars that do not contain loops—i.e., derivations from a nonterminal to a tree containing the same nonterminal—with negative weights.

Intuitively, when the grammar contains no negative loops, we can find a constant SH such that any intermediate derivation with weight greater than s+SH will never result in tree with weight smaller than s. We use this idea to modify the construction of GTrop <sup>≤</sup><sup>s</sup> —i.e., <sup>G</sup>≤<sup>s</sup> for Trop—as follows. First, this constant is bounded by ck<sup>n</sup>+1 where c is the absolute value of the smallest negative weight in the grammar, k is the largest number of nonterminals appearing in one grammar production, and n = |N| is the number of nonterminals. Second, in steps 2 and 3 of the construction, a new nonterminal and the corresponding productions are produced if μ(p) ≤ s + |SH| (previously μ(p) ≤ s). However, if A = Z in steps 2 and 3, we add a new production Z → (A, s ) only if s s.

We now show when this construction terminates and return correct values. Since the tropical semiring combines multiple runs using the min operator, we can *drop* the requirement that the grammar has to be unambiguous.

**Theorem 2.** *Given a WTG* <sup>G</sup> *over Trop and a weight* <sup>s</sup> <sup>∈</sup> <sup>Z</sup>*, the construction of* G*Trop* <sup>≤</sup><sup>s</sup> *terminates if* <sup>G</sup> *contains no loop with cumulative negative weight. Moreover,* G*Trop* <sup>≤</sup><sup>s</sup> *contains exactly every tree* <sup>t</sup> *such that* <sup>w</sup>G(t) <sup>≤</sup> <sup>s</sup>*.*

*Composing semirings*. We next discuss how Theorem 1 relates to product semirings. Given a grammar <sup>G</sup> = (N, Z, P, μ) over a semiring **<sup>S</sup>**<sup>1</sup> <sup>×</sup>**<sup>S</sup> <sup>S</sup>**2, we use <sup>G</sup>**S***<sup>i</sup>* to denote the grammar (N, Z, P, μi) in which the weight function outputs the corresponding projected weight—i.e., if μ(p)=(s1, s2), then μi(p) = si.

Let's first consider the case where the product semiring uses a Pareto partial order. In this case, if Theorem <sup>1</sup> holds for each grammar <sup>G</sup>**<sup>S</sup>***<sup>i</sup>* and <sup>w</sup><sup>i</sup> <sup>i</sup> <sup>s</sup>i, then it holds for G and (w1, w2) <sup>p</sup> (s1, s2). However, the other direction is not true. Theorem 3 proves this intuition and states that, in some sense, solving Pareto partial orders is easier than solving the individual partial orders.

**Theorem 3.** *Given an unambiguous WTG* G *over the semiring S* = *S*<sup>1</sup> ×*<sup>S</sup> S*<sup>2</sup> *with Pareto partial order* <sup>p</sup><sup>=</sup> par(1, 2) *and a weight* <sup>s</sup> = (s1, s2) <sup>∈</sup> <sup>S</sup>*, if the constructions* G*S*<sup>1</sup> 1s<sup>1</sup> *and* <sup>G</sup>*S*<sup>2</sup> 2s<sup>2</sup> *terminate, then the construction of* <sup>G</sup><sup>s</sup> *terminates.*

When we move to Sorted partial order we cannot get an analogous theorem: if Theorem <sup>1</sup> holds for each grammar <sup>G</sup>**<sup>S</sup>***<sup>i</sup>* and <sup>w</sup><sup>i</sup> <sup>i</sup> <sup>s</sup>i, then it does not necessary hold for G and (w1, w2) <sup>s</sup> (s1, s2). In particular, if the semiring **S**<sup>2</sup> is infinite and there exists an s <sup>1</sup> ≺ s1, there will be infinitely many elements (s <sup>1</sup>, ) ≺ (s1, s2). Using this observation, we devise a modified algorithm for reducing grammars with sorted objectives. First, we compute the grammars G**<sup>S</sup>**<sup>1</sup> <sup>≺</sup>1s<sup>1</sup> , <sup>G</sup>**<sup>S</sup>**<sup>1</sup> <sup>=</sup>s<sup>1</sup> , and G**<sup>S</sup>**<sup>2</sup> <sup>≺</sup>2s<sup>2</sup> . Second, we use WTG closure properties to compute G*<sup>s</sup>* (s1, s2) as the union of G**<sup>S</sup>**<sup>1</sup> <sup>≺</sup>1s<sup>1</sup> and intersect(G**<sup>S</sup>**<sup>1</sup> <sup>=</sup>s<sup>1</sup> , G**<sup>S</sup>**<sup>2</sup> <sup>≺</sup>2s<sup>2</sup> ).

*General formulas*. We can now inductively construct the grammar accepting only terms satisfying all constraints in ω. We again use the fact that tree grammars are closed under Boolean operations to compute intersections and unions and correctly characterize all conjunctions and unions appearing in the formulas.

#### **4.2 Finding an Optimal Solution**

If our QSyGuS problem does not require minimization—i.e., opt = f alse—the technique presented in Sect. 4.1 can be used to generate an equivalent SyGuS problem P = (T,ψ(f), G ), which can be solved using off-the-shelf SyGuS solvers. In this section, we show how to extend this technique to handle minimization objectives. Our idea is to use SyGuS solvers to find a non-optimal solution for P and then iteratively refine our grammar G to search for a better solution. This loop is illustrated in Algorithm 1 (lines 5-9). Given the initial solution <sup>f</sup> <sup>∗</sup> to <sup>P</sup> such that <sup>w</sup>G(<sup>f</sup> <sup>∗</sup>) = <sup>s</sup>, we can construct a new grammar <sup>G</sup>≺<sup>s</sup> and look for a solution with lower weight. If the SyGuS solver we use is sound—it can find a solution if it exists—and complete—it can detect if a solution does not exist—Algorithm 1 terminates with an optimal solution.

In general, the above conditions are too strict and in practice this implies that the algorithm will often not terminate. However, if the SyGuS solver is sound, the Algorithm 1 will eventually find the optimal solution, but it will not be able to prove that no smaller one exists. In our experiments, we will show that this approach can yield better solutions than those given by vanilla SyGuS solvers even when Algorithm 1 does not terminate.

#### **5 Implementation and Evaluation**

First, We extended the SyGuS format with new syntax for expressing QSyGuS problems. Our format supports all semirings presented in Sect. 3.1 as well as additional ones. The format also allows creating tuples of semirings using the product operation described in Sect. 3.1. We augment the original SyGuS syntax to support weights on grammar productions. Weight constraints are added using an SMT-like syntax.

Second, we implemented Algorithm 1 in a tool called QuaSi. QuaSi already interfaces with three SyGuS solvers: CVC4 [6], ESolver [4], and EUSolver [5]. QuaSi supports all the semirings allowed in our format and implements a library for tree automata/grammars and weighted tree automata/grammars operations, as well as several optimizations we did not discuss in the paper. In particular, QuaSi often uses simple grammar reduction techniques to simplify the generated grammars, remove unnecessary productions, and consolidate equivalent ones.

We evaluate QuaSi through the following questions (experiments performed on an Intel Core i7 4.00 GHz CPU with 32 GB/RAM).


*Benchmarks.* We perform our evaluation on 26 quantitative extensions of existing SyGuS competition benchmarks taken from 4 SyGuS benchmark tracks [4]: Hackers Delight, Integers, ICFP and Bitvector. 18 of our benchmarks only use a minimization objective over a single semiring (Table 1), while 8 use a minimization objective (Pareto or Sorted) over a product semiring (Table 2). We select SyGuS benchmarks using the following criteria: (*i*) the benchmark can be solved by either CVC4 [6] or ESolver [4], and (*ii*) the solution is not optimal according to some reasonable metric—e.g., size or number of if statements.

#### **5.1 Effectiveness of QSyGuS Solver**

We evaluate the effectiveness of QuaSi on the 18 single-minimization-objective benchmarks. For each benchmark, we run QuaSi using either CVC4 or ESolver as the backend SyGuS solver (we also evaluated QuaSi using EUSolver [5], but, due to its poor performance, we do not report the results). The results are shown in Table 1. The timeout for each iteration of Algorithm 1 is 10 min.

With CVC4, QuaSi terminates with an optimal solution in 9/18 benchmarks, taking less than 5 s (avg: 0.7 s) to solve each sub-problem. In 3 of these cases, the initial solution is already optimal and the second iteration is used to prove optimality. With ESolver, QuaSi terminates with an optimal solution in 8/18 benchmarks, taking less than 7 s (avg: 0.9 s) to solve each sub-problem. In 1 cases, it can find a better solution than the original one, but it cannot prove that the solution is optimal. Overall, by combining solvers, QuaSi can find a better solution than the original SyGuS solution given by one of the two solvers in 9/18 benchmarks. QuaSi cannot improve the initial solution of the linear integer arithmetic benchmarks (array search and LinExpr eq1ex).

Both solvers timeout on large grammars. The grammars in Table 1 are 1 to 2 order of magnitude larger than those in existing SyGuS benchmarks (avg: 224 vs 13 rules) and existing solvers have not yet been optimized for this parameter. In some cases, the solver times out for intermediate grammars that do not contain a solution, but that generate infinitely many terms. In general, existing SyGuS solvers cannot prove unsatisfiability for these types of problems. To answer **Q1**, QuaSi can **solve quantitative variants of 10/18 real SyGuS benchmarks**.


**Table 1.** Performance of QuaSi. **Time** shows the sequence of times taken to solve individual iterations of Algorithm 1. **Largest** is the size of the largest SyGuS sub-

#### **5.2 Solving Time for Different Iterations**

In this section, we evaluate the time required by each iteration of Algorithm 1. Figure 2 shows the ratio of time taken by each iteration with respect to the initial non-quantitative SyGuS solving time. Some of the iterations shown in Fig. 1 do not appear in Fig. 2 since they resulted in no solution—i.e., the initial solution was minimal. CVC4 is typically slower in subsequent iterations and can take up to 10 times the original solving time, while ESolver has comparable runtime to the initial run and is often faster. These numbers are largely due to how the two solvers work: CVC4 is optimized to solve problems where the grammar imposes no restrictions on the structure of the solution, while ESolver performs enumerative search and takes advantage of more restrictive grammars.

One interesting point is the parity not benchmark. ESolver takes 26.9 s to find an initial solution. But, with a weight constraint w < 11, an solution can be found in 2.2 s. CVC4 can find the initial solution with weight 11 in 0.1 s but cannot solve the next iteration. We tried using different solvers in different iterations of our algorithm and, in fact, found that, if we use CVC4 to find an initial solution and then ESolver in subsequent iterations

**Fig. 2.** Solving time across iterations

with restricted grammars we can fully solve this benchmark in a total of 2.3 s which is much better than the time taken by a single solver. To answer **Q2**, with appropriate choices of solvers **the overhead of synthesizing optimal solutions is minimal**.

#### **5.3 Solution Weight Across Iterations**

In this section, we present how the weight of the synthesized solutions change across each iteration of Algorithm 1. Figure 3 shows the percentage of weight of solutions synthesized at each iteration with respect to the weight of the initial SyGuS solution. The result shows that we can improve the solutions of CVC4 by 15– 25% in one iteration, and the solutions of ESolver by 20–50% when taking one iteration and 50–60% when

**Fig. 3.** Solution weight across iterations.

taking two. The Prob benchmarks, which require two iterations, can be improved more when using ESolver because ESolver tends to synthesize small terms whose probability may also be small. To answer **Q3**, QuaSi can **improve the weights of SyGuS solutions by 20–60**%.

#### **5.4 Multi-objective Optimization**

In this section, we evaluate the effectiveness of QuaSi on the 8 benchmarks involving two minimization objectives. The benchmarks consists of two families, 4 for sorted optimization and 4 for Pareto optimization. The sorted optimization benchmarks ask to minimize first the number of occurrences of specified operator (bvand in hacks and ite in array search) and then the size of the solution. The Pareto optimization benchmarks have the same objectives as sorted optimization but here we are synthesizing a Pareto optimal solution instead of sorted optimal one. The results are shown in Table 2. We do not present the results using CVC4 because it cannot solve any of the benchmarks.

The array search times out since it is already hard on a single objective. For the hackers 5 benchmarks, the initial solution is already optimized for the first objective, so the problem degenerates to the single-objective optimization problem. For the hackers 7 and hackers 17, we present the weights of the intermediate solutions we can see that Pareto and Sorted optimizations yield different solutions. To answer **Q4**, QuaSi can **solve problems with multiple objectives** when the same problems are feasible with a single objective.


**Table 2.** Performance of QuaSi on multi-objective benchmarks. **Weight** denotes the sequence of weights explored during minimization.

#### **6 Related Work**

*Qualitative Synthesis.* Existing program synthesizers fall in three categories: (*i*) enumeration solvers, which typically output the smallest program [1], (*ii*) symbolic solvers, which reduce the synthesis problem to a constraint solving problem and output whatever program is produced by the constraint solver [21], (*iii*) probabilistic synthesizers, which randomly search the space for a solution and are typically unpredictable [18]. Since the introduction of the SyGuS format [2], these techniques have been used to build several SyGuS solvers that have competed in SyGuS competitions [4]. The most effective ones, which are used in this paper are ESolver a2nd EUSolver [1] (enumeration), and CVC4 [6] (symbolic).

*Quantitative synthesis.* Domain-specific synthesizers typically employ hardcoded ranking functions that guide the search towards a "preferable" program [17], but these functions are typically hard to write and are decoupled from the functional specification. Unlike QSyGuS, these synthesizers allow arbitrary ranking functions to be expressed in general purpose languages, but typically only support limited grammars for synthesis. Moreover, in many practical applications the ranking functions are very simple. For example, the popular spreadsheet formula synthesizer FlashFill [12] uses a ranking function to prefer small programs with few constants. This type of objective is expressible in our framework.

The Sketch synthesizer supports optimization objectives over variables in sketched programs [20]. This work differs from ours in that sketches are a different specification mechanism from SyGuS. In Sketch the search space is encoded as a program with holes to facilitate synthesis by constraint solving. Translating SyGuS problems into sketches is non-trivial and results in poor performance.

The work closest to ours is Synapse [7], which combines sketching with an approach similar to ours. For the same reasons as for Sketch, Synapse differs from our work because it proposes a different search space mechanisms. However, there are a few analogies between our work and Synapse that are worth explaining in detail. Synapse supports syntactic cost functions that are defined using a decidable theory, and separately from the sketch search space. Synthesis is done using an iterative search where sketches—i.e., set of partial programs with holes—of increasing sizes are given to the synthesizer. At the high level, the intermediate sketches are related to our notion of reduced grammars—i.e., they accept solution of weight less than a given constant. However, while our algorithm generates reduced grammars automatically for a well-defined family of semirings, Synapse requires the user to provide a function for generating the intermediate sketches. Moreover, since Synapse requires cost functions that are defined using a decidable theory, it would not support certain families of costs QSyGuS supports—e.g., the probabilistic semiring.

Koukoutos et al. [15] have proposed the use of probabilistic tree grammars to guide the search of enumerative synthesizers on applications outside of SyGuS. Their algorithm enumerates all terms accepted by the grammar in decreasing probability using a variant of the search algorithm A<sup>∗</sup> and requires the grammar to not contain transitions of weight 1 to avoid getting stuck. Probabilistic tree grammars are a special case of QSyGuS and our algorithm does not impose limitations of what weights can appear in the grammar. Moreover, our algorithm does not require implementing a new solver when changing the cost semiring.

#### **7 Conclusion**

We presented QSyGuS, a general framework for defining and solving SyGuS problems in the presence of quantitative objectives over the syntax of the programs. QSyGuS is (*i*) *natural*: requires minimal modification to the SyGuS format, (*ii*) *general*: it supports complex but practical types of weights, (*iii*) *formal*: it is grounded in the theory of weighted tree grammars, (*iv*) *effective*: our tool QuaSi can solve quantitative variations of existing SyGuS benchmarks with little overhead. In the future, we plan to extend our framework to handle probabilistic objectives and quantitative objectives over the semantics of the program—e.g., synthesize programs that satisfy most of the specification.

**Acknowledgements.** The authors were supported by National Science Foundation Grants CCF-1637516, CCF-1704117 and a Google Research Award.

#### **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

# Learning

### **Learning Abstractions for Program Synthesis**

Xinyu Wang1(B), Greg Anderson1(B), Isil Dillig1(B), and K. L. McMillan2(B)

<sup>1</sup> University of Texas, Austin, USA {xwang,ganderso,isil}@cs.utexas.edu <sup>2</sup> Microsoft Research, Redmond, USA kenmcmil@microsoft.com

**Abstract.** Many example-guided program synthesis techniques use *abstractions* to prune the search space. While abstraction-based synthesis has proven to be very powerful, a domain expert needs to provide a suitable abstract domain, together with the abstract transformers of each DSL construct. However, coming up with useful abstractions can be non-trivial, as it requires both domain expertise and knowledge about the synthesizer. In this paper, we propose a new technique for learning abstractions that are useful for instantiating a general synthesis framework in a new domain. Given a DSL and a small set of training problems, our method uses *tree interpolation* to infer reusable predicate templates that speed up synthesis in a given domain. Our method also learns suitable abstract transformers by solving a certain kind of second-order constraint solving problem in a data-driven way. We have implemented the proposed method in a tool called Atlas and evaluate it in the context of the Blaze meta-synthesizer. Our evaluation shows that (a) Atlas can learn useful abstract domains and transformers from few training problems, and (b) the abstractions learned by Atlas allow Blaze to achieve significantly better results compared to manually-crafted abstractions.

#### **1 Introduction**

Program synthesis is a powerful technique for automatically generating programs from high-level specifications, such as input-output examples. Due to its myriad use cases across a wide range of application domains (e.g., spreadsheet automation [1–3], data science [4–6], cryptography [7,8], improving programming productivity [9–11]), program synthesis has received widespread attention from the research community in recent years.

Because program synthesis is, in essence, a very difficult search problem, many recent solutions prune the search space by utilizing *program abstractions* [4,12–16]. For example, state-of-the-art synthesis tools, such as Blaze [14], Morpheus [4] and Scythe [16], symbolically execute (partial) programs over some abstract domain and reject those programs whose abstract behavior is inconsistent with the given specification. Because many programs share the same behavior in terms of their abstract semantics, the use of abstractions allows these synthesis tools to significantly reduce the search space.

c The Author(s) 2018

**Fig. 1.** Schematic overview of our approach.

While the abstraction-guided synthesis paradigm has proven to be quite powerful, a down-side of such techniques is that they require a domain expert to manually come up with a suitable abstract domain and write abstract transformers for each DSL construct. For instance, the Blaze synthesis framework [14] expects a domain expert to manually specify a universe of predicate templates, together with sound abstract transformers for every DSL construct. Unfortunately, this process is not only time-consuming but also requires significant insight about the application domain as well as the internal workings of the synthesizer.

In this paper, we propose a novel technique for automatically learning domain-specific abstractions that are useful for instantiating an example-guided synthesis framework in a new domain. Given a DSL and a training set of synthesis problems (i.e., input-output examples), our method learns a useful abstract domain in the form of predicate templates and infers sound abstract transformers for each DSL construct. In addition to eliminating the significant manual effort required from a domain expert, the abstractions learned by our method often outperform manually-crafted ones in terms of their benefit to synthesizer performance.

The workflow of our approach, henceforth called Atlas<sup>1</sup>, is shown schematically in Fig. 1. Since Atlas is meant to be used as an *off-line* training step for a general-purpose programming-by-example (PBE) system, it takes as input a DSL as well as a set of synthesis problems *E* that can be used for training purposes. Given these inputs, our method enters a refinement loop where an *Abstraction Learner* component discovers a sequence of increasingly precise abstract domains <sup>A</sup>1, ··, <sup>A</sup>n, and their corresponding abstract transformers <sup>T</sup>1, ··, <sup>T</sup>n, in order to help the *Abstraction-Guided Synthesizer* (AGS) solve all training problems. While the AGS can reject many incorrect solutions using an abstract domain Ai, it might still return some incorrect solutions due to the insufficiency of Ai. Thus, whenever the AGS returns an incorrect solution to any training problem, the Abstraction Learner discovers a more precise abstract domain and automatically synthesizes the corresponding abstract transformers. Upon termination of the algorithm, the final abstract domain A<sup>n</sup> and transformers T<sup>n</sup> are sufficient for the AGS to correctly solve *all* training problems. Furthermore, because our method learns *general* abstractions in the form of

<sup>1</sup> Atlas stands for AuTomated Learning of AbStractions.

predicate templates, the learned abstractions are expected to be useful for solving many *other* synthesis problems beyond those in the training set.

From a technical perspective, the Abstraction Learner uses two key ideas, namely *tree interpolation* and *data-driven constraint solving*, for learning useful abstract domains and transformers respectively. Specifically, given an incorrect program P that cannot be refuted by the AGS using the current abstract domain Ai, the Abstraction Learner generates a tree interpolant I<sup>i</sup> that serves as a proof of P's incorrectness and constructs a new abstract domain Ai+1 by extracting templates from the predicates used in Ii. The Abstraction Learner also synthesizes the corresponding abstract transformers for Ai+1 by setting up a *secondorder constraint solving* problem where the goal is to find the unknown relationship between symbolic constants used in the predicate templates. Our method solves this problem in a data-driven way by sampling input-output examples for DSL operators and ultimately reduces the transformer learning problem to solving a system of linear equations.

We have implemented these ideas in a tool called Atlas and evaluate it in the context of the Blaze program synthesis framework [14]. Our evaluation shows that the proposed technique eliminates the manual effort involved in designing useful abstractions. More surprisingly, our evaluation also shows that the abstractions generated by Atlas outperform manually-crafted ones in terms of the performance of the Blaze synthesizer in two different application domains.

To summarize, this paper makes the following key contributions:


#### **2 Illustrative Example**

Suppose that we wish to use the Blaze meta-synthesizer to automate the class of string transformations considered by FlashFill [1] and BlinkFill [17]. In the original version of the Blaze framework [14], a domain expert needs to come up with a universe of suitable predicate templates as well as abstract transformers for each DSL construct. We will now illustrate how Atlas automates this process, given a suitable DSL and its semantics (e.g., the one used in [17]).

In order to use Atlas, one needs to provide a set of synthesis problems <sup>E</sup> (i.e., input-output examples) that will be used in the training process. Specifically, let us consider the three synthesis problems given below:

$$\mathcal{E} = \left\{ \begin{array}{l} \mathcal{E}\_1: \left\{ \begin{array}{l} \texttt{"\AA\text{"}"} \leftrightharpoons \texttt{"\mathsf{C}\mathsf{A}\mathsf{V}\texttt{2018}"}, \texttt{"\mathsf{S}\mathsf{A}\mathsf{S}\mathsf{T}\end{array} \rightarrow \texttt{"\mathsf{S}\mathsf{A}\mathsf{S}\mathsf{2018}"}, \texttt{"\mathsf{F}\mathsf{S}\mathsf{E}"} \rightarrow \texttt{"\mathsf{F}\mathsf{S}\mathsf{E}\mathsf{2018}"} \right\}, \\ \mathcal{E}\_2: \left\{ \begin{array}{l} \texttt{"\mathsf{C}\mathsf{D}\mathsf{D}\mathsf{V}\mathsf{C}\mathsf{S}\mathsf{e}\mathsf{E}\mathsf{S}\mathsf{e}\mathsf{E}"} \rightarrow \texttt{"\mathsf{C}\mathsf{D}\mathsf{D}\mathsf{S}\mathsf{E}\mathsf{S}\mathsf{f}"} \end{array} \right\}, \\ \mathcal{E}\_3: \left\{ \begin{array}{l} \texttt{"\mathsf{C}\mathsf{D}\mathsf{P}\mathsf{N}\mathsf{e}\mathsf{S}\mathsf{e}\mathsf{E}\mathsf{S}\mathsf{e}\mathsf{S}\mathsf{e}\mathsf{S}\mathsf{e}\mathsf{A}\mathsf{L}\mathsf{T}\mathsf{I}"} \rightarrow \texttt{"\mathsf{C}\mathsf{D}\mathsf{P}\mathsf{N}\mathsf{D}\mathsf{C}\mathsf{S}\mathsf{e}\mathsf{S}\mathsf{e}"} \end{array} \right\}. \end{array} \right\}.$$

In order to construct the abstract domain <sup>A</sup> and transformers <sup>T</sup> , Atlas starts with the trivial abstract domain A<sup>0</sup> = {-} and transformers T0, defined as [[F(-, ··, -)]]- = for each DSL construct F. Using this abstraction, Atlas invokes Blaze to find a program <sup>P</sup><sup>0</sup> that satisfies specification <sup>E</sup><sup>1</sup> under the current abstraction (A0, <sup>T</sup>0). However, since the program <sup>P</sup><sup>0</sup> returned by Blaze is incorrect with respect to the concrete semantics, Atlas tries to find a more precise abstraction that allows Blaze to succeed.

Towards this goal, Atlas enters a refinement loop that culminates in the discovery of the abstract domain A<sup>1</sup> = {-, *len*( <sup>α</sup> ) = <sup>c</sup>, *len*( <sup>α</sup> ) <sup>=</sup> <sup>c</sup>}, where <sup>α</sup> denotes a variable and c is an integer constant. In other words, A<sup>1</sup> tracks equality and inequality constraints on the length of strings. After learning these predicate templates, Atlas also synthesizes the corresponding abstract transformers <sup>T</sup>1. In particular, for each DSL construct, Atlas learns one abstract transformer for each combination of predicate templates used in A1. For instance, for the Concat operator which returns the concatenation y of two strings x1, x2, Atlas synthesizes the following abstract transformers, where denotes any predicate:

T<sup>1</sup> = ⎧ ⎪⎪⎪⎪⎪⎪⎨ ⎪⎪⎪⎪⎪⎪⎩ [[Concat(, -)]]- = [[Concat(-, )]]- = [[Concat *len*(x1) <sup>=</sup> <sup>c</sup>1, *len*(x2) <sup>=</sup> <sup>c</sup><sup>2</sup> ]]- = [[Concat *len*(x1) = c1, *len*(x2) = c<sup>2</sup> ]]- = *len*(y) = c<sup>1</sup> + c<sup>2</sup> [[Concat *len*(x1) = <sup>c</sup>1, *len*(x2) <sup>=</sup> <sup>c</sup><sup>2</sup> ]]- = *len*(y) <sup>=</sup> <sup>c</sup><sup>1</sup> <sup>+</sup> <sup>c</sup><sup>2</sup> [[Concat *len*(x1) <sup>=</sup> <sup>c</sup>1, *len*(x2) = <sup>c</sup><sup>2</sup> ]]- = *len*(y) <sup>=</sup> <sup>c</sup><sup>1</sup> <sup>+</sup> <sup>c</sup><sup>2</sup> ⎫ ⎪⎪⎪⎪⎪⎪⎬ ⎪⎪⎪⎪⎪⎪⎭ .

Since the AGS can successfully solve <sup>E</sup><sup>1</sup> using (A1, <sup>T</sup>1), Atlas now moves on to the next training problem.

For synthesis problem <sup>E</sup>2, the current abstraction (A1, <sup>T</sup>1) is *not* sufficient for Blaze to discover the correct program. After processing <sup>E</sup>2, Atlas refines the abstract domain to the following set of predicate templates:

$$\mathcal{A}\_2 = \left\{ \begin{array}{c} \top, \mathit{len}(\overline{\Box}) = \mathsf{c}, \mathit{len}(\overline{\Box}) \neq \mathsf{c}, \mathit{char}At(\overline{\Box}), \mathsf{i} \right) = \mathsf{c}, \mathit{char}At(\overline{\Box}), \mathsf{i} \right) \neq \mathsf{c} \; \middle| \; . \; \mathsf{i} \right\rangle$$

Observe that Atlas has discovered two additional predicate templates that track positions of characters in the string. Atlas also learns the corresponding abstract transformers T<sup>2</sup> for A2.

Moving on to the final training problem <sup>E</sup>3, Blaze can already successfully solve it using (A2, <sup>T</sup>2); thus, Atlas terminates with this abstraction.

#### **3 Overall Abstraction Learning Algorithm**

Our top-level algorithm for learning abstractions, called LearnAbstractions, is shown in Fig. 2. The algorithm takes two inputs, namely a domain-specific

```
Fig. 2. Overall learning algorithm. Constructs gives the DSL constructs in L.
```
language L (both syntax and semantics) as well as a set of training problems *E*, where each problem is specified as a *set* of input-output examples Ei. The output of our algorithm is a pair (A, <sup>T</sup> ), where <sup>A</sup> is an abstract domain represented by a set of predicate templates and T is the corresponding abstract transformers.

At a high-level, the LearnAbstractions procedure starts with the most imprecise abstraction (just consisting of -) and incrementally improves the precision of the abstract domain A whenever the AGS fails to synthesize the correct program using A. Specifically, the outer loop (lines 4–10) considers each training instance E<sup>i</sup> and performs a fixed-point computation (lines 5–10) that terminates when the current abstract domain A is good enough to solve problem Ei. Thus, upon termination, the learned abstract domain A is sufficiently precise for the AGS to solve all training problems *E*.

Specifically, in order to find an abstraction that is sufficient for solving Ei, our algorithm invokes the AGS with the current abstract domain A and corresponding transformers T (line 6). We assume that Synthesize returns a program P that is consistent with E<sup>i</sup> under abstraction (A, T ). That is, symbolically executing <sup>P</sup> (according to <sup>T</sup> ) on inputs <sup>E</sup>*in* <sup>i</sup> yields abstract values *ϕ* that are consistent with the outputs <sup>E</sup>*out* <sup>i</sup> (i.e., <sup>∀</sup>j. <sup>E</sup>*out* ij <sup>∈</sup> <sup>γ</sup>(ϕ<sup>j</sup> )). However, while <sup>P</sup> is guaranteed to be consistent with E<sup>i</sup> under the abstract semantics, it may not satisfy E<sup>i</sup> under the concrete semantics. We refer to such a program P as *spurious*.

Thus, whenever the call to IsCorrect fails at line 8, we invoke the LearnAbstractDomain procedure (line 9) to learn additional predicate templates that are later added to A. Since the refinement of A necessitates the synthesis of new transformers, we then call LearnTransformers (line 10) to learn a new <sup>T</sup> . The new abstraction is guaranteed to rule out the spurious program P as long as there is a unique best transformer of each DSL construct for domain A.

#### **4 Learning Abstract Domain Using Tree Interpolation**

In this section, we present the LearnAbstractDomain procedure: Given a spurious program P and a synthesis problem E that P does not solve, our goal is to find new predicate templates A to add to the abstract domain A such that the Abstraction-Guided Synthesizer no longer returns P as a valid solution to the synthesis problem E. Our key insight is that we can mine for such useful predicate templates by constructing a *tree interpolation* problem. In what follows, we first review tree interpolants (based on [18]) and then explain how we use this concept to find useful predicate templates.

**Definition 1 (Tree interpolation problem).** *A tree interpolation problem* <sup>T</sup> = (V, r, P, L) *is a directed labeled tree, where* <sup>V</sup> *is a finite set of nodes,* <sup>r</sup> <sup>∈</sup> <sup>V</sup> *is the root,* <sup>P</sup> : (<sup>V</sup> \{r}) → <sup>V</sup> *is a function that maps children nodes to their parents, and* <sup>L</sup> : <sup>V</sup> → <sup>F</sup> *is a labeling function that maps nodes to formulas from a set* F *of first-order formulas such that* <sup>v</sup>∈<sup>V</sup> <sup>L</sup>(v) *is unsatisfiable.*

In other words, a tree interpolation problem is defined by a tree T where each node is labeled with a formula and the conjunction of these formulas is unsatisfiable. In what follows, we write *Desc*(v) to denote the set of all descendants of node v, including v itself, and we write *NonDesc*(v) to denote all nodes other than those in *Desc*(v) (i.e., <sup>V</sup> \*Desc*(v)). Also, given a set of nodes <sup>V</sup> , we write L(V ) to denote the set of all formulas labeling nodes in V .

Given a tree interpolation problem <sup>T</sup>, a *tree interpolant* <sup>I</sup> is an annotation from every node in V to a formula such that the label of the root node is *false* and the label of an internal node v is entailed by the conjunction of annotations of its children nodes. More formally, a tree interpolant is defined as follows:

**Definition 2 (Tree interpolant).** *Given a tree interpolation problem* T = (V, r, P, L)*, a tree interpolant for* <sup>T</sup> *is a function* <sup>I</sup> : <sup>V</sup> → <sup>F</sup> *that satisfies the following conditions:*


Intuitively, the first condition ensures that I establishes the unsatisfiability of formulas in T, and the second condition states that I is a valid annotation. As standard in Craig interpolation [19,20], the third condition stipulates a "shared vocabulary" condition by ensuring that the annotation at each node v refers to the common variables between the descendants and non-descendants of v.

**Fig. 3.** A tree interpolation problem and a tree interpolant (underlined).

**Fig. 4.** Algorithm for learning abstract domain using tree interpolation.

*Example 1.* Consider the tree interpolation problem T = (V, r, P, L) in Fig. 3, where <sup>L</sup>(v) is shown to the right of each node <sup>v</sup>. A tree interpolant <sup>I</sup> for this problem maps each node to the corresponding underlined formula. For instance, we have <sup>I</sup>(v1)=(*len*(v1) = 7). It is easy to confirm that <sup>I</sup> is a valid interpolant according to Definition 2.

To see how tree interpolation is useful for learning predicates, suppose that the spurious program P is represented as an abstract syntax tree (AST), where each non-leaf node is labeled with the axiomatic semantics of the corresponding DSL construct. Now, since P does not satisfy the given input-output example (e*in*, e*out*), we are able to use this information to construct a labeled tree where the conjunction of labels is unsatisfiable. Our key idea is to mine useful predicate templates from the formulas used in the resulting tree interpolant.

With this intuition in mind, let us consider the LearnAbstractDomain procedure shown in Fig. 4: The algorithm uses a procedure called Construct-Tree to generate a tree interpolation problem T for each input-output example (e*in*, e*out*)<sup>2</sup> that program <sup>P</sup> does not satisfy (line 5). Specifically, letting <sup>Π</sup> denote the AST representation of <sup>P</sup>, we construct <sup>T</sup> = (V, r, P, L) as follows:


$$L(v) = \begin{cases} v' = e\_{out} & v \text{ is the dummy root node with child } v'. \\ v = e\_{in} & v \text{ is a leaf representing program input } e\_{in}. \\ v = c & v \text{ is a leaf representing constant } c. \\ \phi\_F[v'/x, v/y] & v \text{ represents DSL operator } F \text{ with axiomat semantics} \\ \phi\_F(x, y) \text{ and } v' \text{ represents children of } v. \end{cases}$$

<sup>2</sup> Without loss of generality, we assume that programs take a single input x, as we can always represent multiple inputs as a list.

Essentially, the ConstructTree procedure labels any leaf node representing the program input with the input example e*in* and the root node with the output example e*out*. All other internal nodes are labeled with the axiomatic semantics of the corresponding DSL operator (modulo renaming).<sup>3</sup> Observe that the formula <sup>v</sup>∈<sup>V</sup> <sup>L</sup>(v) is guaranteed to be unsatisfiable since <sup>P</sup> does not satisfy the I/O example (e*in*, e*out*); thus, we can obtain a tree interpolant for T.

*Example 2.* Consider program <sup>P</sup> : Concat(x, "18") which concatenates constant string "18" to input x. Figure 3 shows the result of invoking ConstructTree for <sup>P</sup> and input-output example ("CAV", "CAV2018"). As mentioned in Example 1, the tree interpolant I for this problem is indicated with the underlined formulas.

Since the tree interpolant I effectively establishes the incorrectness of program P, the predicates used in I serve as useful abstract values that the synthesizer (AGS) should consider during the synthesis task. Towards this goal, the LearnAbstractDomain algorithm iterates over each predicate used in <sup>I</sup> (lines 7–8 in Fig. 4) and converts it to a suitable template by replacing the constants and variables used in <sup>I</sup>(v) with symbolic names (or "holes"). Because the original predicates used in I may be too specific for the current input-output example, extracting templates from the interpolant allows our method to learn reusable abstract domains.

*Example 3.* Given the tree interpolant <sup>I</sup> from Example 1, LearnAbstractDomain extracts two predicate templates, namely, *len*( <sup>α</sup> ) = <sup>c</sup> and *len*( <sup>α</sup> ) <sup>=</sup> <sup>c</sup>.

#### **5 Synthesis of Abstract Transformers**

In this section, we turn our attention to the LearnTransformers procedure for synthesizing abstract transformers T for a given abstract domain A. Following presentation in prior work [14], we consider abstract transformers that are described using equations of the following form:

$$\|F(\chi\_1(x\_1, \mathbf{c}\_1), \dots, \chi\_n(x\_n, \mathbf{c}\_n))\|^\sharp = \bigwedge\_{1 \le j \le m} \chi'\_j(y, \mathbf{f}\_j(\mathbf{c})) \tag{1}$$

Here, F is a DSL construct, χi, χ <sup>j</sup> are predicate templates<sup>4</sup>, <sup>x</sup><sup>i</sup> is the <sup>i</sup>'th input of <sup>F</sup>, <sup>y</sup> is <sup>F</sup>'s output, *<sup>c</sup>***<sup>1</sup>**, ··, *cn* are vectors of *symbolic* constants, and *<sup>f</sup>* <sup>j</sup> denotes a vector of *affine functions* over *<sup>c</sup>* <sup>=</sup> *<sup>c</sup>***<sup>1</sup>**, ··, *cn* . Intuitively, given concrete predicates describing the inputs to F, the transformer returns concrete predicates describing the output. Given such a transformer τ , let Outputs(τ ) be the set of pairs (χ <sup>j</sup> , *<sup>f</sup>* <sup>j</sup> ) in Eq. 1.

<sup>3</sup> Here, we assume access to the DSL's axiomatic semantics. If this is not the case (i.e., we are only given the DSL's operational semantics), we can still annotate each node as v = c where c denotes the output of the partial program rooted at node v when executed on e*in*. However, this may affect the quality of the resulting interpolant.

<sup>4</sup> We assume that χ- <sup>1</sup>, ··, χ- <sup>m</sup> are distinct.

**Fig. 5.** Algorithm for synthesizing abstract transformers. φ<sup>F</sup> at line 6 denotes the axiomatic semantics of DSL construct F. Formula Λ at line 8 refers to Eq. 5.

We define the soundness of a transformer τ for DSL operator F with respect to F's axiomatic semantics φ<sup>F</sup> . In particular, we say that the abstract transformer from Eq. 1 is *sound* if the following implication is valid:

$$\left(\phi\_F(x, y) \land \bigwedge\_{1 \le i \le n} \chi\_i(x\_i, \mathbf{c}\_i)\right) \Rightarrow \bigwedge\_{1 \le j \le m} \chi'\_j(y, \mathbf{f}\_j(\mathbf{c})) \tag{2}$$

That is, the transformer for F is sound if the (symbolic) output predicate is indeed implied by the (symbolic) input predicates according to F's semantics.

Our key observation is that the problem of learning sound transformers can be reduced to solving the following *second-order constraint solving* problem:

$$\exists \mathbf{f}. \; \forall \mathbf{V}. \left( \left( \phi\_F(\mathbf{z}, y) \land \bigwedge\_{1 \le i \le n} \chi\_i(x\_i, \mathbf{c}\_i) \right) \Rightarrow \bigwedge\_{1 \le j \le m} \chi'\_j(y, \mathbf{f}\_j(\mathbf{c})) \right) \tag{3}$$

where *<sup>f</sup>* <sup>=</sup> *<sup>f</sup>* <sup>1</sup>, ··, *<sup>f</sup>* <sup>m</sup> and *<sup>V</sup>* includes all variables and functions from Eq. <sup>2</sup> other than *f* . In other words, the goal of this constraint solving problem is to find interpretations of the unknown functions *f* that make Eq. <sup>2</sup> valid. Our key insight is to solve this problem in a *data-driven* way by exploiting the fact that each unknown function fj,k is affine.

Towards this goal, we first express each affine function <sup>f</sup>j,k(*c*) as follows:

$$f\_{j,k}(\mathbf{c}) = p\_{j,k,1} \cdot c\_1 + \dots + p\_{j,k,|\mathbf{c}|} \cdot c\_{|\mathbf{c}|} + p\_{j,k,|\mathbf{c}|+1}$$

where each pj,k,l corresponds to an unknown integer constant that we would like to learn. Now, arranging the coefficients of functions <sup>f</sup>j,<sup>1</sup>, ··, fj,|*<sup>f</sup> <sup>j</sup>* <sup>|</sup> in *<sup>f</sup>* <sup>j</sup> into a <sup>|</sup>*<sup>f</sup>* <sup>j</sup> | × (|*c*<sup>|</sup> + 1) matrix <sup>P</sup><sup>j</sup> , we can represent *<sup>f</sup>* <sup>j</sup> (*c*) in the following way:

$$(f\_j(\mathbf{c}))^\mathsf{T} = \underbrace{\begin{bmatrix} f\_{j,1}(\mathbf{c}) \\ \cdots \\ f\_{j,|f\_j|}(\mathbf{c}) \end{bmatrix}}\_{c'\_j \mathsf{T}} = \underbrace{\begin{bmatrix} p\_{j,1,1} & \cdots & p\_{j,1,|c|+1} \\ \cdots & & \cdots \\ p\_{j,|f\_j|,1} & \cdots & p\_{j,|f\_j|,|c|+1} \end{bmatrix}}\_{P\_j} \underbrace{\begin{bmatrix} c\_1 \\ \cdots \\ c\_{|c|} \\ 1 \end{bmatrix}}\_{c^\dagger} \tag{4}$$

where *c*† is *<sup>c</sup>*appended with the constant 1.

Given this representation, it is easy to see that the problem of synthesizing the unknown functions *<sup>f</sup>* <sup>1</sup>, ··, *<sup>f</sup>* <sup>m</sup> from Eq. <sup>2</sup> boils down to finding the unknown matrices <sup>P</sup>1, ··, P<sup>m</sup> such that each <sup>P</sup><sup>j</sup> makes the following implication valid:

$$A \equiv \left( \left( (\mathbf{c}\_j'^\mathsf{T} = P\_j \mathbf{c}^\dagger) \land \phi\_F(\mathbf{z}, y) \land \bigwedge\_{1 \le i \le n} \chi\_i(x\_i, \mathbf{c}\_i) \right) \Rightarrow \chi\_j'(y, \mathbf{c}\_j') \right) \tag{5}$$

Our key idea is to infer these unknown matrices <sup>P</sup>1, ··, P<sup>m</sup> in a data-driven way by generating input-output examples of the form [i1, ··, i<sup>|</sup>*c*<sup>|</sup>] → [o1, ··, o<sup>|</sup>*<sup>f</sup> <sup>j</sup>* <sup>|</sup>] for each *<sup>f</sup>* <sup>j</sup> . In other words, *<sup>i</sup>* and *<sup>o</sup>* correspond to instantiations of *<sup>c</sup>* and *<sup>f</sup>* <sup>j</sup> (*c*) respectively. Given sufficiently many such examples for every *<sup>f</sup>* <sup>j</sup> , we can then reduce the problem of learning each unknown matrix P<sup>j</sup> to the problem of solving a system of linear equations.

Based on this intuition, the LearnTransformers procedure from Fig. 5 describes our algorithm for learning abstract transformers T for a given abstract domain A. At a high-level, our algorithm synthesizes one abstract transformer for each DSL construct <sup>F</sup> and <sup>n</sup> argument predicate templates <sup>χ</sup>1, ··, χn. In particular, given <sup>F</sup> and <sup>χ</sup>1, ··, χn, the algorithm constructs the "return value" of the transformer as:

$$\varphi = \bigwedge\_{1 \le j \le m} \chi'\_j(y, \mathbf{f}\_j(\mathbf{c})) $$

where *<sup>f</sup>* <sup>j</sup> is the inferred affine function for each predicate template <sup>χ</sup> j .

The key part of our LearnTransformers procedure is the inner loop (lines 5–8) for inferring each of these *<sup>f</sup>* <sup>j</sup> 's. Specifically, given an output predicate template χ <sup>j</sup> , our algorithm first generates a set of input-output examples <sup>E</sup> of the form [p1, ··, pn] → <sup>p</sup><sup>0</sup> such that [[F(p1, ··, pn)]]- = p<sup>0</sup> is a sound (albeit overly specific) transformer. Essentially, each p<sup>i</sup> is a concrete instantiation of a predicate template, so the examples E generated at line 6 of the algorithm can be viewed as sound input-output examples for the general symbolic transformer given in Eq. 1. (We will describe the GenerateExamples procedure in Sect. 5.1).

Once we generate these examples E, the next step of the algorithm is to learn the unknown coefficients of matrix P<sup>j</sup> from Eq. 5 by solving a system of linear equations (line 7). Specifically, observe that we can use each input-output example [p1, ··, pn] → <sup>p</sup><sup>0</sup> in <sup>E</sup> to construct one row of Eq. 4. In particular, we can directly extract *<sup>c</sup>* <sup>=</sup> *<sup>c</sup>*<sup>1</sup>, ··, *<sup>c</sup>*<sup>n</sup> from <sup>p</sup>1, ··, p<sup>n</sup> and the corresponding value of *<sup>f</sup>* <sup>j</sup> (*c*) from <sup>p</sup>0. Since we have one instantiation of Eq. <sup>4</sup> for each of the inputoutput examples in E, the problem of inferring matrix P<sup>j</sup> now reduces to solving a system of linear equations of the form AP <sup>T</sup> <sup>j</sup> <sup>=</sup> <sup>B</sup> where <sup>A</sup> is a <sup>|</sup>E| × (|*c*<sup>|</sup> + 1) (input) matrix and <sup>B</sup> is a <sup>|</sup>E|×|*<sup>f</sup>* <sup>j</sup> <sup>|</sup> (output) matrix. Thus, a solution to the

**Fig. 6.** Example generation for learning abstract transformers.

equation AP <sup>T</sup> <sup>j</sup> <sup>=</sup> <sup>B</sup> generated from <sup>E</sup> corresponds to a candidate solution for matrix <sup>P</sup><sup>j</sup> , which in turn uniquely defines *<sup>f</sup>* <sup>j</sup> .

Observe that the call to Solve at line 7 may return *null* if no affine function exists. Furthermore, any *non-null <sup>f</sup>* <sup>j</sup> returned by Solve is just a *candidate* solution and may not satisfy Eq. 5. For example, this situation can arise if we do not have sufficiently many examples in E and end up discovering an affine function that is "over-fitted" to the examples. Thus, the validity check at line 8 of the algorithm ensures that the learned transformers are actually sound.

#### **5.1 Example Generation**

In our discussion so far, we assumed an oracle that is capable of generating valid input-output examples for a given transformer. We now explain our GenerateExamples procedure from Fig. 6 that essentially implements this oracle. In a nutshell, the goal of GenerateExamples is to synthesize input-output examples of the form [p1, ··, pn] → <sup>p</sup><sup>0</sup> such that [[F(p1, ··, pn)]]- = p<sup>0</sup> is sound where each p<sup>i</sup> is a concrete predicate (rather than symbolic).

Going into more detail, GenerateExamples takes as input the semantics φ<sup>F</sup> of DSL construct F for which we want to learn a transformer for as well as the input predicate templates <sup>χ</sup>1, ··, χ<sup>n</sup> and output predicate template <sup>χ</sup><sup>0</sup> that are supposed to be used in the transformer. For any example [p1, ··, pn] → <sup>p</sup><sup>0</sup> synthesized by GenerateExamples, each concrete predicate p<sup>i</sup> is an instantiation of the predicate template χ<sup>i</sup> where the symbolic constants used in χ<sup>i</sup> are substituted with *concrete* values.

Conceptually, the GenerateExamples algorithm proceeds as follows: First, it generates *concrete* input-output examples [s1, ··, sn] → <sup>s</sup><sup>0</sup> by evaluating <sup>F</sup> on randomly-generated inputs <sup>s</sup>1, ··, s<sup>n</sup> (lines 4–5). Now, for each concrete I/O example [s1, ··, sn] → <sup>s</sup>0, we generate a set of *abstract* I/O examples of the form [p1, ··, pn] → <sup>p</sup><sup>0</sup> (line 6). Specifically, we assume that the return value (A0, ··, An) of Abstract at line 6 satisfies the following properties for every <sup>p</sup><sup>i</sup> <sup>∈</sup> <sup>A</sup>i:


In other words, we assume that Abstract returns a set of "best" sound abstractions of (s0, ··, sn) under predicate templates (χ0, ··, χn).

Next, given abstractions (A0, ··, An) for (s0, ··, sn), we consider each candidate abstract example of the form [p1, ··, pn] → <sup>p</sup><sup>0</sup> where <sup>p</sup><sup>i</sup> <sup>∈</sup> <sup>A</sup>i. Even though each <sup>p</sup><sup>i</sup> is a sound abstraction of <sup>s</sup>i, the example [p1, ··, pn] → <sup>p</sup><sup>0</sup> may not be valid according to the semantics of operator F. Thus, the validity check at line 8 ensures that each example added to E is in fact valid.

*Example 4.* Given abstract domain <sup>A</sup> <sup>=</sup> {*len*( <sup>α</sup> ) = <sup>c</sup>}, suppose we want to learn an abstract transformer τ for the Concat operator of the following form:

$$\left[\mathsf{Concat}\left(len(x\_1) = \mathsf{c}\_1, len(x\_2) = \mathsf{c}\_2\right)\right]^\sharp = \left(len(y) = f(\left[\mathsf{c}\_1, \mathsf{c}\_2\right])\right)^\sharp$$

We learn the affine function f used in the transformer by first generating a set E of I/O examples for f (line 6 in LearnTransformers). In particular, GenerateExamples generates concrete input values for Concat at random and obtains the corresponding output values by executing Concat on the input values. For instance, it may generate s<sup>1</sup> = "abc" and s<sup>2</sup> = "de" as inputs, and obtain s<sup>0</sup> = "abcde" as output. Then, it abstracts these values under the given templates. In this case, we have an abstract example with p<sup>1</sup> = *len*(x1)=3 , p<sup>2</sup> = *len*(x2)=2 and p<sup>0</sup> = *len*(y)=5 . Since [p1, p2] → <sup>p</sup><sup>0</sup> is a valid example, it is added in E (line 8 in GenerateExamples). At this point, E is not yet full rank, so the algorithm keeps generating more examples. Suppose it generates two more valid examples *len*(x1)=1, *len*(x2)=4 → *len*(y)=5 and *len*(x1)=6, *len*(x2)=4 → *len*(y) = 10 . Now E is full rank, so Learn-Transformers computes f by solving the following system of linear equations:

$$
\begin{bmatrix} 3 \ 2 \ 1 \\ 1 \ 4 \ 1 \\ 6 \ 4 \ 1 \end{bmatrix} P^T = \begin{bmatrix} 5 \\ 5 \\ 10 \end{bmatrix} \xrightarrow{\mathsf{Solve}} P = \begin{bmatrix} 1 \ 1 \ 0 \end{bmatrix}
$$

Here, P corresponds to the function f([c1, c2]) = c<sup>1</sup> + c2, and this function defines the sound transformer: [[Concat *len*(x1) = c1, *len*(x2) = c<sup>2</sup> ]]- = *len*(y) = c<sup>1</sup> + c<sup>2</sup> which is added to <sup>T</sup> at line 9 in LearnTransformers.

#### **6 Soundness and Completeness**

In this section we present theorems stating some of the soundness, completeness, and termination guarantees of our approach. All proofs can be found in the extended version of this paper [21].

**Theorem 1 (Soundness of** LearnTransformers**).** *Let* <sup>T</sup> *be the set of transformers returned by* LearnTransformers*. Then, every* <sup>τ</sup> ∈ T *is sound according to Eq. 2.*

The remaining theorems are predicated on the assumptions that for each DSL construct <sup>F</sup> and input predicate templates <sup>χ</sup>1, ··, χ<sup>n</sup> (i) there exists a unique best abstract transformer and (ii) the strongest transformer expressible in Eq. 2 is logically equivalent to the unique best transformer. Thus, before stating these theorems, we first state what we mean by a *unique best abstract transformer*.

**Definition 3 (Unique best function).** *Consider a family of transformers of the shape* [[F <sup>χ</sup>1(x1, *c*<sup>1</sup>), ··, χn(xn, *c*n) ]]- = χ (y, )*. We say that f is the unique best function for* (F, χ1, ··, χn, χ ) *if (a) replacing with f yields a sound transformer, and (b) replacing with any other f ' yields a transformer that is either unsound or strictly worse (i.e.,* χ (y, *f*) <sup>⇒</sup> <sup>χ</sup> (y, *f* ) *and* χ (y, *f* ) ⇒ <sup>χ</sup> (y, *f*)*).*

We now define unique best transformer in terms of unique best function:

**Definition 4 (Unique best transformer).** *Let* F *be a DSL construct and let* (χ1, ··, χn) ∈ A<sup>n</sup> *be the input templates for* <sup>F</sup>*. We say that the abstract transformer* <sup>τ</sup> *is a unique best transformer for* F, χ1, ··, χ<sup>n</sup> *if (a)* <sup>τ</sup> *is sound, and (b) for any predicate template* <sup>χ</sup> ∈ A*, we have* (χ, *f*) <sup>∈</sup> Outputs(<sup>τ</sup> ) *if and only if f is a unique best function for* (F, χ1, ··, χn, χ) *for some affine <sup>f</sup>.*

**Definition 5 (Complete sampling oracle).** *Let* <sup>F</sup> *be a construct,* <sup>A</sup> *an abstract domain, and* R<sup>F</sup> *a probability distribution over* Domain(F) *with finite support* <sup>S</sup>*. Futher, for any input predicate templates* <sup>χ</sup>1, ··, χ<sup>n</sup> *and output predicate template* <sup>χ</sup><sup>0</sup> *in* <sup>A</sup> *admitting a unique best function <sup>f</sup>, let* <sup>C</sup>(χ0, ··, χn) *be the set of tuples* (c0, ··, cn) *such that* (χ0(y, c0), χ1(x1, c1), ··, χn(xn, cn)) <sup>∈</sup> <sup>A</sup>0×··×A<sup>n</sup> *and* <sup>c</sup><sup>0</sup> <sup>=</sup> *<sup>f</sup>*(c1, ··, cn)*, where* <sup>A</sup><sup>0</sup> ×··×A<sup>n</sup> <sup>=</sup> Abstract(s0, χ0, ··, sn, χn) *and* (s1, ··, sn) <sup>∈</sup> <sup>S</sup> *and* <sup>s</sup><sup>0</sup> = [[F(s1, ··, sn)]]*. The distribution* <sup>R</sup><sup>F</sup> *is a* complete sampling oracle *if* <sup>C</sup>(χ0, ··, χn) *has full rank for all* <sup>χ</sup>0, ··, χn*.*

The following theorem states that LearnTransformers is guaranteed to synthesize the best transformer if a unique one exists:

**Theorem 2 (Completeness of** LearnTransformers**).** *Given an abstract domain* <sup>A</sup> *and a complete sampling oracle* <sup>R</sup><sup>F</sup> *for* <sup>A</sup>*,* LearnTransformers *terminates. Further, let* <sup>T</sup> *be the set of transformers returned and let* <sup>τ</sup> *be the unique best transformer for DSL construct* F *and input predicate templates* <sup>χ</sup>1, ··, χ<sup>n</sup> ∈ A<sup>n</sup>*. Then we have* <sup>τ</sup> ∈ T *.*

Using this completeness (modulo unique best transformer) result, we can now state the termination guarantees of our LearnAbstractions algorithm:

**Theorem 3 (Termination of** LearnAbstractions**).** *Given a complete sampling oracle* R<sup>F</sup> *for every abstract domain and the unique best transformer assumption, if there exists a solution for every problem* <sup>E</sup><sup>i</sup> <sup>∈</sup> *<sup>E</sup>, then* LearnAbstractions *terminates.*

#### **7 Implementation and Evaluation**

We have implemented the proposed method as a new tool called Atlas, which is written in Java. Atlas takes as input a set of training problems, an Abstraction-Guided Synthesizer (AGS), and a DSL and returns an abstract domain (in the form of predicate templates) and the corresponding transformers. Internally, Atlas uses the Z3 theorem prover [22] to compute tree interpolants and the JLinAlg linear algebra library [23] to solve linear equations.

To assess the usefulness of Atlas, we conduct an experimental evaluation in which our goal is to answer the following two questions:


#### **7.1 Abstraction Learning**

To answer our first question, we use Atlas to automatically learn abstractions for two application domains: (i) string manipulations and (ii) matrix transformations. We provide Atlas with the DSLs used in [14] and employ Blaze as the underlying Abstraction-Guided Synthesizer. Axiomatic semantics for each DSL construct were given in the theory of equality with uninterpreted functions.

*Training Set Information.* For the string domain, our training set consists of exactly the four problems used as motivating examples in the BlinkFill paper [17]. Specifically, each training problem consists of 4–6 examples that demonstrate the desired string transformation. For the matrix domain, our training set consists of four (randomly selected) synthesis problems taken from online forums. Since almost all online posts contain a single input-output example, each training problem includes one example illustrating the desired matrix transformation.

*Main Results.* Our main results are summarized in Fig. 7. The main takeaway message is that Atlas can learn abstractions quite efficiently and does not require a large training set. For example, Atlas learns 5 predicate templates and 30 abstract transformers for the string domain in a total of 10.2 s. Interestingly, Atlas does not need all the training problems to infer these four predicates and converges to the final abstraction after just processing the first training instance. Furthermore, for the first training instance, it takes Atlas 4 iterations in the learning loop (lines 5–10 from Fig. 2) before it converges to the final abstraction. Since this abstraction is sufficient, it takes just one iteration for each following training problem to synthesize a correct program.

Looking at the right side of Fig. 7, we also observe similar results for the matrix domain. In particular, Atlas learns 10 predicate templates and 59 abstract transformers in a total of 22.5 s. Furthermore, Atlas converges to the final abstract domain after processing the first three problems<sup>5</sup> and the number of iterations for each training instance is also quite small (ranging from 1 to 3).

<sup>5</sup> The learned abstractions can be found in the extended version of this paper [21].


String domain

Matrix domain

**Fig. 7.** Training results. |A|, |T |, Iters denote the number of predicate templates, abstract transformers, and iterations taken per training instance (lines 5–10 from Fig. 2), respectively. TAGS, TA, T<sup>T</sup> denote the times for invoking the synthesizer (AGS), learning the abstract domain, and learning the abstract transformers, respectively. T*total* shows the total training time in seconds.


**Fig. 8.** Improvement of Blaze over Blaze† on string and matrix benchmarks.

#### **7.2 Evaluating the Usefulness of Learned Abstractions**

To answer our second question, we integrated the abstractions synthesized by Atlas into the Blaze meta-synthesizer. In the remainder of this section, we refer to all instantiations of Blaze using the Atlas-generated abstractions as Blaze. To assess how useful the automatically generated abstractions are, we compare Blaze against Blaze†, which refers to the manually-constructed instantiations of Blaze described in [14].

*Benchmark Information.* For the string domain, our benchmark suite consists of (1) *all* 108 string transformation benchmarks that were used to evaluate Blaze† and (2) 40 additional challenging problems that are collected from online forums which involve manipulating file paths, URLs, etc. The number of examples for each benchmark ranges from 1 to 400, with a median of 7 examples. For the matrix domain, our benchmark set includes (1) *all* 39 matrix transformation benchmarks in the Blaze† benchmark suite and (2) 20 additional challenging problems collected from online forums. *We emphasize that the set of benchmarks used for evaluating* Blaze *are completely* disjoint *from the set of synthesis problems used for training* Atlas.

*Experimental Setup.* We evaluate Blaze and Blaze† using the same DSLs from the Blaze paper [14]. For each benchmark, we provide the same set of input-output examples to Blaze and Blaze†, and use a time limit of 20 min per synthesis task.

*Main Results.* Our main evaluation results are summarized in Fig. 8. The key observation is that Blaze consistently improves upon Blaze† for both string and matrix transformations. In particular, Blaze not only solves more benchmarks than Blaze† for both domains, but also achieves about an order of magnitude speed-up on average for the common benchmarks that both tools can solve. Specifically, for the string domain, Blaze solves 133 (out of 148) benchmarks within an average of 2.8 s and achieves an average 8.3× speed-up over Blaze†. For the matrix domain, we also observe a very similar result where Blaze leads to an overall speed-up of 9.2<sup>×</sup> on average.

In summary, this experiment confirms that the abstractions discovered by Atlas are indeed useful and that they outperform manually-crafted abstractions despite eliminating human effort.

#### **8 Related Work**

To our knowledge, this paper is the first one to automatically learn abstract domains and transformers that are useful for program synthesis. We also believe it is the first to apply interpolation to program synthesis, although interpolation has been used to synthesize other artifacts such as circuits [24] and strategies for infinite games [25]. In what follows, we briefly survey existing work related to program synthesis, abstraction learning, and abstract transformer computations.

*Program Synthesis.* Our work is intended to complement example-guided program synthesis techniques that utilize program abstractions to prune the search space [4,14–16]. For example, Simpl [15] uses abstract interpretation to speed up search-based synthesis and applies this technique to the generation of imperative programs for introductory programming assignments. Similarly, Scythe [16] and Morpheus [4] perform enumeration over program sketches and use abstractions to reject sketches that do not have any valid completion. Somewhat different from these techniques, Blaze constructs a finite tree automaton that accepts all programs whose behavior is consistent with the specification according to the DSL's abstract semantics. We believe that the method described in this paper can be useful to all such abstraction-guided synthesizers.

*Abstraction Refinement.* In verification, as opposed to synthesis, there have been many works that use Craig interpolants to refine abstractions [20,26,27]. Typically, these techniques generalize the interpolants to abstract domains by extracting a vocabulary of predicates, but they do not generalize by adding parameters to form templates. In our case, this is essential because interpolants derived from fixed input values are too specific to be directly useful. Moreover, we *reuse* the resulting abstractions for subsequent synthesis problems. In verification, this would be analogous to re-using an abstraction from one property or program to the next. It is conceivable that template-based generalization could be applied in verification to facilitate such reuse.

*Abstract Transformers.* Many verification techniques use logical abstract domains [28–32]. Some of these, following Yorsh, *et al.* [33] use sampling with a decision procedure to evaluate the abstract transformer [34]. Interpolation has also been used to compile efficient symbolic abstract transformers [35]. However, these techniques are restricted to finite domains or domains of finite height to allow convergence. Here, we use infinite parameterized domains to obtain better generalization; hence, the abstract transformer computation is more challenging. Nonetheless, the approach might also be applicable in verification.

#### **9 Limitations**

While this paper takes a first step towards automatically inferring useful abstractions for synthesis, our proposed method has the following limitations:

*Shapes of Transformers.* Following prior work [14], our algorithm assumes that abstract transformers have the shape given in Eq. 1. We additionally assume that constants *c* used in predicate templates are numeric values and that functions in Eq. 1 are affine. This assumption holds in several domains considered in prior work [4,14] and allows us to develop an efficient learning algorithm that reduces the problem to solving a system of linear equations.

*DSL Semantics.* Our method requires the DSL designer to provide the DSL's logical semantics. We believe that giving logical semantics is much easier than coming up with useful abstractions, as it does not require insights about the internal workings of the synthesizer. Furthermore, our technique could, in principle, also work without logical specifications although the learned abstract domain may not be as effective (see Footnote 3 in Sect. 4) and the synthesized transformers would not be provably sound.

*UBT Assumption.* Our completeness and termination theorems are predicated on the *unique best transformer (UBT)* assumption. While this assumption holds in our evaluation, it may not hold in general. However, as mentioned in Sect. 6, we can always guarantee termination by including the concrete predicates used in the interpolant I in addition to the symbolic templates extracted from I.

#### **10 Conclusion**

We proposed a new technique for automatically instantiating abstraction-guided synthesis frameworks in new domains. Given a DSL and a few training problems, our method automatically discovers a useful abstract domain and the corresponding transformers for each DSL construct. From a technical perspective, our method uses tree interpolation to extract reusable templates from failed synthesis attempts and automatically synthesizes unique best transformers if they exist. We have incorporated the proposed approach into the Blaze metasynthesizer and show that the abstractions discovered by Atlas are very useful.

While we have applied the proposed technique to program synthesis, we believe that some of the ideas introduced here are more broadly applicable. For instance, the idea of extracting reusable predicate templates from interpolants and synthesizing transformers in a data-driven way could also be useful in the context of program verification.

### **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

### **The Learnability of Symbolic Automata**

George Argyros1(B) and Loris D'Antoni<sup>2</sup>

<sup>1</sup> Columbia University, New York, NY, USA argyros@cs.columbia.edu <sup>2</sup> University of Wisconsin-Madison, Madison, WI, USA loris@cs.wisc.edu

**Abstract.** Symbolic automata (s-FAs) allow transitions to carry predicates over rich alphabet theories, such as linear arithmetic, and therefore extend classic automata to operate over infinite alphabets, such as the set of rational numbers. In this paper, we study the problem of the learnability of symbolic automata. First, we present MAT <sup>∗</sup>, a novel L∗-style algorithm for learning symbolic automata using membership and equivalence queries, which treats the predicates appearing on transitions as their own learnable entities. The main novelty of MAT <sup>∗</sup> is that it can take as input an algorithm Λ for learning predicates in the underlying alphabet theory and it uses Λ to infer the predicates appearing on the transitions in the target automaton. Using this idea, MAT <sup>∗</sup> is able to learn automata operating over alphabets theories in which predicates are efficiently learnable using membership and equivalence queries. Furthermore, we prove that a necessary condition for efficient learnability of an s-FA is that predicates in the underlying algebra are also efficiently learnable using queries and thus settling the learnability of a large class of s-FA instances. We implement MAT <sup>∗</sup> in an open-source library and show that it can efficiently learn automata that cannot be learned using existing algorithms and significantly outperforms existing automata learning algorithms over large alphabets.

#### **1 Introduction**

In 1987, Dana Angluin showed that finite automata *can be learned* in polynomial time using membership and equivalence queries [3]. In this learning model, often referred to as a *minimally adequate teacher* (MAT), the teacher can answer (*i*) whether a given string belongs to the target language being learned and (*ii*) whether a certain automaton is correct and accepts the target language, and provide a counterexample if the automaton is incorrect. Following this result, her L<sup>∗</sup> algorithm has been studied extensively [16,17], it has been extended to several variants of finite automata [4,12,20] and has found many applications in program analysis [2,6,7] and program synthesis [25].

Recent work [6,11] developed algorithms which can efficiently learn s-FAs over certain alphabet theories. These algorithms operate using an underlying predicate learning algorithm which can learn partitions of the domain using predicates from counterexamples. While such results give sufficient conditions under which s-FAs can be efficiently learned, they do not provide any necessary conditions. More precisely, the following question remains open:

#### *For what alphabet theories can s-FAs be efficiently learned?*

In this paper, we make significant progress towards answering this question by providing new sufficient and necessary conditions for efficiently learning symbolic automata. More specifically, we present MAT <sup>∗</sup>, a new algorithm for learning s-FAs using membership and equivalence queries. The main novelty of MAT <sup>∗</sup> is that it can accept as input a MAT learning algorithm Λ for predicates in the underlying alphabet theory. Afterwards, MAT <sup>∗</sup> spawns instances of Λ to infer each transition in the target s-FA and efficiently answers membership and equivalence queries performed by Λ using the s-FA membership and equivalence oracles. The predicate learning algorithms do not need to learn entire partitions but individual predicates and therefore, MAT <sup>∗</sup> greatly simplifies the design of learning algorithms for s-FAs by allowing one to reuse existing learning algorithms for the underlying alphabet theory. Moreover, MAT <sup>∗</sup> allows the underlying predicate learning algorithms to perform *both* membership and equivalence queries, thus extending the class of efficiently learnable s-FAs to MAT-learnable alphabet theories—e.g., bit-vector predicates expressed as BDDs.

Furthermore, we show that a necessary condition for efficiently learning a symbolic automaton over a Boolean algebra is that the individual predicates in the algebra also have to be efficiently learnable. Moreover, we provide a characterization of the instances which are not efficiently learnable by our algorithm and conjecture that such instances are not learnable by any efficient algorithm.

We implement MAT <sup>∗</sup> in the open-source symbolicautomata library [1] and evaluate it on 15 regular-expression benchmarks, 1,500 s-FA benchmarks over bit-vector alphabets, and 18 synthetic benchmarks over infinite alphabets. Our results show that MAT <sup>∗</sup> can efficiently learn automata over different alphabet theories, some of which cannot be learned using existing algorithms. Moreover, for large finite alphabets, MAT <sup>∗</sup> significantly outperforms existing automata learning algorithms.

**Contributions.** In summary, our contributions are:


#### **2 Background**

#### **2.1 Boolean Algebras and Symbolic Automata**

In symbolic automata, transitions carry predicates over a decidable Boolean algebra. An *effective Boolean algebra* <sup>A</sup> is a tuple (D,Ψ,[[ ]], <sup>⊥</sup>, ,∨,∧,¬) where D is a set of *domain elements*; Ψ is a set of *predicates* closed under the Boolean connectives, with <sup>⊥</sup>, ∈ <sup>Ψ</sup>; [[ ]] : <sup>Ψ</sup> <sup>→</sup> <sup>2</sup><sup>D</sup> is a *denotation function* such that (*i*) [[⊥]] = <sup>∅</sup>, (*ii*) [[]] = <sup>D</sup>, and (*iii*) for all ϕ, ψ <sup>∈</sup> <sup>Ψ</sup>, [[<sup>ϕ</sup> <sup>∨</sup> <sup>ψ</sup>]] = [[ϕ]] <sup>∪</sup>[[ψ]], [[<sup>ϕ</sup> <sup>∧</sup> <sup>ψ</sup>]] =[[ϕ]] <sup>∩</sup>[[ψ]], and [[¬ϕ]] = <sup>D</sup> \[[ϕ]].

*Example 1 (Equality Algebra).* The *equality algebra* for an arbitrary set D has predicates formed from Boolean combinations of formulas of the form λc. c = a where <sup>a</sup> <sup>∈</sup> <sup>D</sup>. Formally, <sup>Ψ</sup> is generated from the Boolean closure of <sup>Ψ</sup><sup>0</sup> <sup>=</sup> {ϕ<sup>a</sup> <sup>|</sup> <sup>a</sup> <sup>∈</sup> <sup>D</sup>} ∪ {⊥, } where for all <sup>a</sup> <sup>∈</sup> <sup>D</sup>, [[ϕa]] = {a}. Examples of predicates in this algebra include λc. c = 5 <sup>∨</sup> <sup>c</sup> = 10 and λc.¬(<sup>c</sup> = 0).

**Definition 1 (Symbolic Finite Automata).** *A* symbolic finite automaton *(s-FA)* <sup>M</sup> *is a tuple* (A, Q, q*init*, F,Δ) *where* <sup>A</sup> *is an effective Boolean algebra, called the* alphabet*;* <sup>Q</sup> *is a finite set of states;* <sup>q</sup>*init* <sup>∈</sup> <sup>Q</sup> *is the* initial state*;* <sup>F</sup> <sup>⊆</sup> <sup>Q</sup> *is the set of* final states*; and* <sup>Δ</sup> <sup>⊆</sup> <sup>Q</sup> <sup>×</sup> <sup>Ψ</sup><sup>A</sup> <sup>×</sup> <sup>Q</sup> *is the* transition relation *consisting of a finite set of* moves *or* transitions*.*

*Characters* are elements of DA, and *words* or *strings* are finite sequences of characters, or elements of D<sup>∗</sup> <sup>A</sup>. The empty word of length 0 is denoted by . A move <sup>ρ</sup> = (q1, ϕ, q2) <sup>∈</sup> <sup>Δ</sup>, also denoted by <sup>q</sup><sup>1</sup> ϕ −→ <sup>q</sup>2, is a transition from the *source* state q<sup>1</sup> to the *target* state q2, where ϕ is the *guard* or *predicate* of the move. For a state <sup>q</sup> <sup>∈</sup> <sup>Q</sup>, we denote by guard(q) the set of guards for all moves from <sup>q</sup>. For a character <sup>a</sup> <sup>∈</sup> <sup>D</sup>A, an *a-move* of <sup>M</sup>, denoted <sup>q</sup><sup>1</sup> a −→ <sup>q</sup><sup>2</sup> is a move q1 ϕ −→ <sup>q</sup><sup>2</sup> such that <sup>a</sup> <sup>∈</sup>[[ϕ]].

An s-FA <sup>M</sup> is *deterministic* if, for all transitions (q,ϕ1, q1),(q,ϕ2, q2) <sup>∈</sup> <sup>Δ</sup>, <sup>q</sup><sup>1</sup> <sup>=</sup> <sup>q</sup><sup>2</sup> <sup>→</sup> [[ϕ<sup>1</sup> <sup>∧</sup> <sup>ϕ</sup>2]] = <sup>∅</sup>—i.e., for each state <sup>q</sup> and character <sup>a</sup> there is at most one <sup>a</sup>-move out of <sup>q</sup>. An s-FA <sup>M</sup> is *complete* if, for all <sup>q</sup> <sup>∈</sup> <sup>Q</sup>, [[- (q,ϕ*i*,q*i*)∈<sup>Δ</sup> <sup>ϕ</sup>i]] = <sup>D</sup>—i.e., for each state <sup>q</sup> and character <sup>a</sup> there exists an <sup>a</sup>move out of q. Throughout the paper we assume all s-FAs are deterministic and complete, since determinization and completion are always possible [10]. Given an s-FA <sup>M</sup> = (A, Q, qinit, F,Δ) and a state <sup>q</sup> <sup>∈</sup> <sup>Q</sup>, we say a word <sup>w</sup> <sup>=</sup> <sup>a</sup>1a<sup>2</sup> ··· <sup>a</sup><sup>k</sup> is *accepted at state* <sup>q</sup> if, for 1 <sup>≤</sup> <sup>i</sup> <sup>≤</sup> <sup>k</sup>, there exist moves <sup>q</sup><sup>i</sup>−<sup>1</sup> <sup>a</sup>*<sup>i</sup>* −→ <sup>q</sup><sup>i</sup> such that <sup>q</sup>init <sup>=</sup> <sup>q</sup> and <sup>q</sup><sup>k</sup> <sup>∈</sup> <sup>F</sup>.

For a deterministic s-FA M and a word w, we denote by Mq[w] the state reached in M by w when starting at state q. When q is omitted we assume that execution starts at <sup>q</sup>init. For a word <sup>w</sup> <sup>=</sup> <sup>a</sup><sup>1</sup> ··· <sup>a</sup>k, we use <sup>w</sup>[i..] = <sup>a</sup><sup>i</sup> ··· <sup>a</sup>k, w[..i] = <sup>a</sup><sup>1</sup> ··· <sup>a</sup>i, w[i] = <sup>a</sup><sup>i</sup> to denote the suffix starting from the <sup>i</sup>-th position, the prefix up to the i-th position and the character at the i-th position respectively. We use <sup>B</sup> <sup>=</sup> {**T**, **<sup>F</sup>**} to denote the Boolean domain. A word <sup>w</sup> is called an *access string* for state <sup>q</sup> <sup>∈</sup> <sup>Q</sup> if <sup>M</sup>[w] = <sup>q</sup>. For two states q, p <sup>∈</sup> <sup>Q</sup>, a word w is called a distinguishing string, if exactly one of Mq[w] and Mp[w] is final.

#### **2.2 Learning Model**

In this paper, we follow the notation from [17]. A concept is a Boolean function <sup>c</sup> : <sup>D</sup> <sup>→</sup> <sup>B</sup>. A concept class <sup>C</sup> is a set of concepts which is represented using representation class R. By representation class we denote a fixed function from strings to concepts in C. For example, regular expressions, DFAs and NFAs are different representation classes for the concept class of regular languages.

The learning model under which all learning algorithms in this paper operate is called *exact learning from membership and equivalence queries* or learning using a Minimal Adequate Teacher (MAT), and was originally introduced by Angluin [3]. In this model, to learn an unknown concept <sup>c</sup> ∈ C, a learning algorithm has access to two types of queries:


An algorithm is a learning algorithm for a concept class <sup>C</sup> if, for any <sup>c</sup> ∈ C, the algorithm terminates with a correct model for c after making a finite number of membership and equivalence queries. In this paper, we will say that a learning algorithm is *efficient* for a concept class <sup>C</sup> if it learns any concept <sup>c</sup> ∈ C using a polynomial number of queries on the size of the representation of the target concept in R and the length of the longest counterexample provided to the algorithm.

An effective Boolean algebra <sup>A</sup> = (D,Ψ,[[ ]], <sup>⊥</sup>, ,∨,∧,¬) naturally defines the concept class 2<sup>D</sup> with representations in Ψ of predicates over the domain D. We will say that an algorithm is a learning algorithm for the algebra A to denote a learning algorithm that can efficiently learn predicates from the representation class Ψ.

#### **3 The** *MAT <sup>∗</sup>* **Algorithm**

Our learning algorithm, MAT <sup>∗</sup>, can be viewed as a symbolic version of the TTT algorithm for learning DFAs [16], but without discriminator finalization. The learning algorithm accepts as input a membership oracle O, an equivalence oracle <sup>E</sup> as well as a learning algorithm <sup>Λ</sup> for the underlying Boolean algebra used in the target s-FA M. The algorithm uses a classification tree [17] to generate a partition of D<sup>∗</sup> into equivalence classes which represent the states in the target s-FA. Once a tree is obtained, we can use it to determine, for any word <sup>w</sup> <sup>∈</sup> <sup>D</sup>∗, the state accessed by <sup>w</sup> in <sup>M</sup>—i.e., what state the automaton reaches when reading the word <sup>w</sup>. Then, we build an s-FA model <sup>H</sup>, using the algebra learning algorithm Λ to create models for each transition guard and utilizing the classification tree in order to implement a membership oracle for Λ. Once a **Algorithm 1.** s-FA-Learn(O, <sup>E</sup>, Λ) // s-FA Learning algorithm **Require:** O: membership oracle, E: equivalence oracle, Λ: algebra learning algorithm. T ← InitializeClassificationTree(O) S<sup>Λ</sup> ← InitializeGuardLearners(T,Λ) H ← GetSFAModel(T,SΛ, O) **while** E(H) = **T do** w ← GetCounterexample(H) T,S<sup>Λ</sup> ← ProcessCounterexample(T,SΛ, w, O) H ← GetSFAModel(T,SΛ, O) **return** H

model is generated, we check for equivalence and, given a counterexample, we either update the classification tree with a new state and a corresponding distinguishing string, or propagate the counterexample into one of the instances of the algebra learning algorithm Λ. The structure of MAT <sup>∗</sup> is shown in Algorithm 1. In the rest of the section, we use the s-FA in Fig. 1 as a running example for our algorithm.

#### **3.1 The Classification Tree**

The main data structure used by our learning algorithm is the classification tree (CT) [17]. The classification tree is a tree data structure used to store the access and distinguishing strings for the target s-FA so that all internal nodes of the tree are labelled using a distinguishing string while all leafs are labeled using access strings.

**Fig. 1.** An s-FA over equality algebra.

**Definition 2.** *A classification tree* T = (V, L, E) *is a binary tree such that:*


Intuitively, given any internal node <sup>v</sup> <sup>∈</sup> <sup>V</sup> , any leaf <sup>l</sup><sup>T</sup> reached by following the **T**-child of v can be distinguished from any leaf l<sup>F</sup> reached by the **F**-child using v. In other words, the membership queries for l<sup>T</sup> v and l<sup>F</sup> v produce different results—i.e., <sup>O</sup>(l<sup>T</sup> <sup>v</sup>) <sup>=</sup> <sup>O</sup>(l<sup>F</sup> <sup>v</sup>).

*Tree Initialization.* To initialize the CT data structure, we use a membership query on the empty word . Then, we create a CT with two nodes, a root node labeled with and one child also labeled with . The child of the root is either <sup>a</sup> **<sup>T</sup>**-child or **<sup>F</sup>**-child, according to the result of the <sup>O</sup>() query.

*The sift Operation.* The main operation performed using the classification tree is an operation called sift which allows one to determine, for any input word s,

**Fig. 2.** (left) Classification tree and corresponding learned states for our running example. (right) Two different instances of failed partition verification checks that occured during learning and their respective updates on the given counterexamples (CE).

the state reached by s in the target s-FA. The sift(s) operation performs the following steps:


Note that, until both children of the root node are added, we will have inputs that may not end up in any leaf node. In these cases our sift operation will return <sup>⊥</sup> and MAT <sup>∗</sup> will add the queried input as a new leaf in the tree.

Once a classification tree is obtained, we use it to simulate a membership oracle for the underlying algebra learning algorithm Λ. This oracle is then used to infer models for the transitions and eventually construct an s-FA model. In Fig. 2 we show the classification tree and the corresponding states learned by the MAT <sup>∗</sup> algorithm during the execution on our running example from Fig. 1.

#### **3.2 Building an s-FA Model**

Assume we are given a classification tree T = (V, L, E). Our next task is to use the tree along with the underlying algebra learning algorithm Λ to produce an s-FA model. The main idea is to spawn an instance of the Λ algorithm for each potential transition and then use the classification tree to answer membership queries posed by each Λ instance. Initially, we define an s-FA <sup>H</sup> = (A, QH, q, FH, ΔH), where <sup>Q</sup><sup>H</sup> <sup>=</sup> {q<sup>s</sup> <sup>|</sup> <sup>s</sup> <sup>∈</sup> <sup>L</sup>}—i.e. we create one state for each leaf of the classification tree <sup>T</sup>. Finally, for any <sup>q</sup> <sup>∈</sup> <sup>Q</sup>H, we have that <sup>q</sup> <sup>∈</sup> <sup>F</sup><sup>H</sup> if and only if <sup>O</sup>(q) = **<sup>T</sup>**. Next, we will show how to build the transition relation for H. As mentioned above, our construction is based on the idea of spawning instances of Λ for each potential transition of the s-FA and then using the classification tree to decide, for each character, if the character satisfies the guard of the potential transition thus answering membership queries performed by the underlying algebra learner.

*Guard Inference.* To infer the set of guards in the transition relation <sup>Δ</sup>H, we spawn, for each pair of states (qu, qv) <sup>∈</sup> <sup>Q</sup><sup>H</sup> <sup>×</sup> <sup>Q</sup>H, an instance <sup>Λ</sup>(q*u*,q*v*) of the algebra learning algorithm. We answer membership queries to Λ(q*u*,q*v*) as follows. Let <sup>α</sup> <sup>∈</sup> <sup>D</sup> be a symbol queried by <sup>Λ</sup>(q*u*,q*v*) . Then, we return **T** as the answer to <sup>O</sup>(α) if sift(uα) = <sup>v</sup> and **<sup>F</sup>** otherwise. Once <sup>Λ</sup>(q*u*,q*v*) submits an equivalence query <sup>E</sup>(φ) using a model <sup>φ</sup>, we suspend the execution of the algorithm and add the transition (qu, φ, qv) in <sup>Δ</sup>H.

*Partition Verification.* Once all algebra learners have submitted a model through an equivalence query, we have a complete transition relation <sup>Δ</sup>H. However, at this point there is no guarantee that for each state q the outgoing transitions from q form a partition of the domain D. Therefore, it may be the case that our s-FA model H is in fact non-deterministic and, moreover, that certain symbols do not satisfy any guard. Using such a model in an equivalence query would result in an *improper* learning algorithm and potential problems in the counterexample processing algorithm in Sect. 3.3. To mitigate this issue we perform the following checks:


These checks are iterated for each state until no more counterexamples are found. In Fig. 2 we demonstrate instances of failed determinism and completeness checks while learning our running example from Fig. 1 along with the corresponding updates on the predicates. For details regarding the equality algebra learner, see Sect. 5.

*Optimizing the Number of Algebra Learning Instances.* Note that in the description above, MAT <sup>∗</sup> spawns one instance of Λ for each possible transition between states in H. To reduce the number of spawned algebra learning instances, we perform the following optimization: For each state q<sup>s</sup> we initially spawn a single algebra learning instance Λ(q*s*,?). Let α be the first symbol queried by Λ(q*s*,?) and let <sup>u</sup> <sup>=</sup> sift(sα). We return as a query answer for <sup>α</sup> to <sup>Λ</sup>(q*s*,?) and set the target state for the instance to qu, i.e. we convert the algebra learning instance to <sup>Λ</sup>(q*s*,q*u*). Afterwards, we keep a set <sup>R</sup> <sup>=</sup> {q<sup>v</sup> <sup>|</sup> <sup>v</sup> <sup>=</sup> sift(sβ)} for all <sup>β</sup> <sup>∈</sup> <sup>D</sup> queried by the different algebra learning instances and generate new instances only for states <sup>q</sup><sup>v</sup> <sup>∈</sup> <sup>R</sup> for which the guards are not yet inferred. Using this optimization, the total number of generated algebra learning instances never exceeds the number of transitions in the target s-FA.

#### **3.3 Processing Counterexamples**

For counterexample processing, we adapt the algorithm used in [6] in the setting of MAT <sup>∗</sup> . In a nutshell, our algorithm works similarly to the classic Rivest-Schapire algorithm [23] and the TTT algorithm [16] for learning DFAs, where a binary search is performed to locate the index in the counterexample where the executions of the model automaton and the target one diverge. However, once this breakpoint index is found, our algorithm performs further analysis to determine if the divergence is caused by an undiscovered state in our model automaton or because the guard predicate that consumes the breakpoint index character is incorrect.

*Error Localization.* Let <sup>w</sup> be a counterexample for a model <sup>H</sup> generated as described above. For each index <sup>i</sup> <sup>∈</sup> [0..|w|], let <sup>q</sup><sup>u</sup> <sup>=</sup> <sup>H</sup>[w[..i]] be the state accessed by <sup>w</sup>[..i] in <sup>H</sup> and let <sup>γ</sup><sup>i</sup> <sup>=</sup> uw[<sup>i</sup> + 1..]. In other words, <sup>γ</sup><sup>i</sup> is obtained by first running <sup>w</sup> in <sup>H</sup> for <sup>i</sup> steps and then, concatenating the access string for the state reached in <sup>H</sup> with the word <sup>w</sup>[<sup>i</sup> + 1..]. Note that, because initially the model <sup>H</sup> and the target s-FA start at the same state accessed by , the two machines are synchronized and therefore, <sup>O</sup>(γ0) = <sup>O</sup>(w). Moreover, since <sup>w</sup> is a counterexample, we have that <sup>O</sup>(γ|w|) <sup>=</sup> <sup>O</sup>(w). It follows that, there exists an index <sup>j</sup>, which we will refer to as *breakpoint*, for which <sup>O</sup>(γ<sup>j</sup> ) <sup>=</sup> <sup>O</sup>(γ<sup>j</sup>+1). The counterexample processing algorithm uses a binary search on the index j to find such a breakpoint. For more information on the correctness of this method we refer the reader to [6,23].

*Breakpoint Analysis.* Once we find an index <sup>j</sup> such that <sup>O</sup>(γ<sup>j</sup> ) <sup>=</sup> <sup>O</sup>(γ<sup>j</sup>+1) we can conclude that the transition taken in <sup>H</sup> from <sup>H</sup>[w[..j]] with the symbol <sup>w</sup>[j+1] is incorrect. In traditional algorithms for learning DFAs, the sole reason for having an incorrect transition would be that the transition is actually directed to a yet undiscovered state in the target automaton. However, in the symbolic setting we have to explore two different possibilities. Let <sup>q</sup><sup>u</sup> <sup>=</sup> <sup>H</sup>[w[..j]] be the state accessed in <sup>H</sup> by <sup>w</sup>[..j], <sup>q</sup><sup>v</sup> <sup>=</sup> sift(uw[<sup>j</sup> + 1]) be the result of sifting uw[<sup>j</sup> + 1] in the classification tree and consider the transition (qu, φ, qv) <sup>∈</sup> <sup>Δ</sup>H. We use the guard φ to determine if the counterexample was caused by an invalid predicate guard or an undiscovered state in the target s-FA.

*Case 1. Incorrect guard.* Assume that <sup>w</sup>[<sup>j</sup> + 1] <sup>∈</sup>[[φ]]. Note that, <sup>φ</sup> was generated as a model by Λ(q*u*,q*v*) and therefore, a membership query from Λ(q*u*,q*v*) for a character α returns **T** if sift(uα) = v. Moreover, we have that sift(uw[j + 1]) = <sup>v</sup>. Therefore, if <sup>w</sup>[<sup>j</sup> + 1] <sup>∈</sup> [[φ]], then <sup>w</sup>[<sup>j</sup> + 1] is a counterexample for the learning instance Λ(q*u*,q*v*) which produced φ. We proceed to supply Λ(q*u*,q*v*) with

**Fig. 3.** (left) A minimal s-FA. (right) The s-FA corresponding to the classification tree of MAT <sup>∗</sup> with access strings for qinit and q<sup>2</sup> and a single distinguishing string .

the counterexample w[j + 1], update the corresponding guard and continue to generate a new s-FA model.

*Case 2. Undiscovered state.* Assume <sup>w</sup>[<sup>j</sup> + 1] <sup>∈</sup>[[φ]]. It follows that <sup>φ</sup> is behaving as expected on the symbol w[j + 1] based on the current classification tree. We conclude that the state accessed by w[..j + 1] is in fact an undiscovered state in the target s-FA which we have to distinguish from the previously discovered states. Therefore, we proceed to add a new leaf in the tree to access this state. More specifically, we replace the leaf labelled with v with a sub-tree consisting of three nodes: the root is the word w[j + 1..], which is the distinguishing string for the states accessed by v and uw[j + 1]. The **T**-child and **F**-child of this node are labelled with the words <sup>v</sup> and uw[j] based on the results of <sup>O</sup>(v) and <sup>O</sup>(uw[<sup>j</sup> + 1]).

Finally, we have to take care of one last point: Once we add another state in the classification tree, certain queries that were previously directed to v may be directed to uw[j] once we sift them down in the tree. This change implies that certain previous queries performed by algebra learning instances Λ(q*s*,q*v*) may be given invalid results and therefore, we can no longer guarantee correctness of the generated predicates. To solve this problem, we terminate all instances Λ(q*s*,q*v*) for all <sup>q</sup><sup>s</sup> <sup>∈</sup> <sup>Q</sup><sup>H</sup> and replace them with fresh instances of the algebra learning algorithm.

#### **4 Correctness and Completeness of** *MAT <sup>∗</sup>*

Given a learning algorithm <sup>Λ</sup>, we use <sup>C</sup><sup>Λ</sup> <sup>m</sup>(n) to denote the number of membership queries and <sup>C</sup><sup>Λ</sup> <sup>e</sup> (n) to denote the number of equivalence queries performed by Λ for a target concept with representation size n. In our analysis we will also use the following definitions:

**Definition 3.** *Let* <sup>M</sup> = (A, Q, q0, F,Δ) *over a Boolean algebra* <sup>A</sup> *and let* <sup>S</sup> <sup>⊆</sup> <sup>Ψ</sup>A*. Then, we define:*

*– The maximum size of the union of predicates in* <sup>S</sup> *as* <sup>U</sup>(S) *def* = max<sup>Φ</sup>⊆<sup>S</sup> | - <sup>φ</sup>∈<sup>Φ</sup> <sup>φ</sup>|*.*

*– The maximum guard union size for* <sup>M</sup> *as* <sup>B</sup>(M) *def* = maxq∈<sup>Q</sup> <sup>U</sup>(*guard*(q))*.*

The value B(M) denotes the maximum size that a predicate guard may take in any intermediate hypothesis produced by MAT <sup>∗</sup> during the learning process. Contrary to traditional L∗-style algorithms, the size of the intermediate hypothesis produced by MAT <sup>∗</sup> may fluctuate as we demonstrate in the following example.

*Example 2.* Consider the s-FA in the left side of Fig. 3. When we execute the MAT <sup>∗</sup> algorithm in this s-FA, and after an access string for q<sup>2</sup> is added to the classification tree, the tree will correspond to the s-FA shown on the right, in which the transition from qinit is taken over the union of the individual transitions in the target. Certain sequences of answers to equivalence queries can force MAT <sup>∗</sup> to first learn a correct model of <sup>φ</sup><sup>1</sup> <sup>∨</sup>φ<sup>2</sup> <sup>∨</sup>φ<sup>3</sup> before revealing a new state in the target s-FA.

We now state the correctness and query complexity of our algorithm.

**Theorem 1.** *Let* <sup>M</sup> = (A, Q, q0, F,Δ) *be an s-FA,* <sup>Λ</sup> *be a learning algorithm* <sup>A</sup> *and let* <sup>k</sup> <sup>=</sup> <sup>B</sup>(M)*. Then,* MAT <sup>∗</sup> *will learn* <sup>M</sup> *using* <sup>Λ</sup> *with* <sup>O</sup>(|Q<sup>|</sup> <sup>2</sup>|Δ|C<sup>Λ</sup> *<sup>m</sup>*(k) + |Q| <sup>2</sup>|Δ|C<sup>Λ</sup> *<sup>e</sup>* (k) log <sup>m</sup>) *membership and* <sup>O</sup>(|Q||Δ|C<sup>Λ</sup> *<sup>e</sup>* (k)) *equivalence queries, where* m *is the length of the longest counterexample given to* MAT <sup>∗</sup>*.*

*Proof.* First, we note that our counterexample processing algorithm only splits a leaf if there exists a valid distinguishing condition separating the two newly generated leafs. Therefore, the number of leafs in the discrimination tree is always at most <sup>|</sup>Q|. Next, note that each counterexample is processed using a binary search with complexity O(log m) to detect the breakpoint and, afterwards, either a new state is added or a counterexample is dispatched to the corresponding algebra learner.

Each classification tree T = (V, L, E) defines a partition over D<sup>∗</sup> and, therefore, an s-FA <sup>H</sup><sup>T</sup> . In the worst case, MAT <sup>∗</sup> will learn <sup>H</sup><sup>T</sup> exactly before a new state in the target s-FA is revealed through an equivalence query. Since H<sup>T</sup> is the result of merging states in the target s-FA, we conclude that the size of each predicate in <sup>H</sup><sup>T</sup> is at most <sup>k</sup>. It follows that, for each classification tree <sup>T</sup>, we can get at most <sup>|</sup>Δ<sup>H</sup>*<sup>T</sup>* |C<sup>Λ</sup> <sup>e</sup> (k) counterexamples until a new state is uncovered on the target s-FA. Note here, that our counterexample processing algorithm ensures that each counterexample will be either a valid counterexample for a predicate guard in H<sup>T</sup> or it will uncover a new state. For each membership query performed by an underlying algebra learner, we have to sift a string in the classification tree which requires at most <sup>|</sup>Q<sup>|</sup> membership queries. Therefore, the total number of membership queries performed for each candidate model H is bounded by <sup>O</sup>(|Δ|(|Q|C<sup>Λ</sup> m(k)+C<sup>Λ</sup> <sup>e</sup> (k) log m) where m is the size of the longest counterexample so far. The number of equivalence queries is bounded by <sup>O</sup>(|Δ|C<sup>Λ</sup> <sup>e</sup> (k)). When a new state is uncovered, we assume that, in the worst case, all the algebra learners will be restarted (this is an overestimation) and therefore, the same process will be repeated at most <sup>|</sup>Q<sup>|</sup> times giving us the stated bounds.

Note that the bounds on the number of queries stated in Theorem 1 are based on the worst-case assumption that we may have to restart *all* guard learning instances each time we discover a new state. In practice, we expect these bounds to be closer <sup>O</sup>(|Δ|C<sup>Λ</sup> m(k)+(|Δ|C<sup>Λ</sup> <sup>e</sup> (k)+|Q|) log <sup>m</sup>) membership and <sup>O</sup>(|Δ|C<sup>Λ</sup> <sup>e</sup> (k)+ <sup>|</sup>Q|) equivalence queries.

**Minimality of Learned s-FA.** Since the MAT <sup>∗</sup> will only add a new state in the s-FA if a distinguishing sequence is found it follows that the total number of states in the s-FA is minimal. Moreover, MAT <sup>∗</sup> will not modify in any way the predicates returned by the underlying algebra learning instances. Therefore, if the size of the predicates returned by the Λ instances is minimal, MAT <sup>∗</sup> will maintain their minimality.

The following theorem shows that it is indeed not possible to learn s-FAs over a Boolean algebra that is not itself learnable.

**Theorem 2.** *Let* Λ*s-FA be an efficient learning algorithm for the algebra of s-FAs over a Boolean algebra* A*. Then, the Boolean algebra* A *is efficiently learnable.*

**Which s-FAs Are Efficiently Learnable?** Theorem 2 shows that efficient learnability of an s-FA requires efficient learnability of the underlying algebra. Moreover, from Theorem 1 it follows that efficiently learnability using MAT <sup>∗</sup> depends on the following property of the underlying algebra:

**Corollary 1.** *Let* A *be an efficiently learnable Boolean algebra and consider the class* <sup>R</sup>*s-FA* <sup>A</sup> *of s-FAs over* <sup>A</sup>*. Then,* <sup>R</sup>*s-FA* <sup>A</sup> *is efficiently learnable using* MAT <sup>∗</sup> *if and only if, for any set* <sup>S</sup> <sup>⊆</sup> <sup>Ψ</sup><sup>A</sup> *such that for any distinct* φ, ψ <sup>∈</sup> <sup>S</sup> <sup>=</sup><sup>⇒</sup> [[<sup>φ</sup> <sup>∧</sup> <sup>ψ</sup>]] = <sup>∅</sup>*, we have that* <sup>U</sup>(S) = *poly*(|S|, max<sup>φ</sup>∈<sup>S</sup> <sup>|</sup>φ|)*.*

At this point we would like to point out that the above condition arises due to the fact that MAT <sup>∗</sup> is a congruence-based algorithm which successively computes hypothesis automata based on refining a set of access and distinguishing strings which is a common characteristic among all L∗-based algorithms. Therefore, this limitation of MAT <sup>∗</sup> is expected to be shared by any other algorithm in the same family. Given the fact that after three decades of research, L∗-based algorithms are the only known, provably efficient algorithms for learning DFAs (and subsequently s-FAs), we expect that expanding the class of learnable s-FAs is a very challenging task.

#### **5 Learnable Boolean Algebras**

We will now describe a number of interesting effective Boolean algebras which are efficiently learnable using membership and equivalence queries.

*Boolean Algebras Over Finite Domains.* Let A be any Boolean Algebra over a finite domain <sup>D</sup>. Then, any predicate <sup>φ</sup> <sup>∈</sup> <sup>Ψ</sup> can be learned using <sup>|</sup>D<sup>|</sup> membership queries. More specifically, the learning algorithm constructs a predicate φ accepting all elements in D for which the membership queries return true as <sup>φ</sup> <sup>=</sup> {<sup>c</sup> <sup>|</sup> <sup>c</sup> <sup>∈</sup> <sup>D</sup> ∧ O(c) = **<sup>T</sup>**}. Plugging this algebra learning algorithm into our algorithm, we get the TTT learning algorithm for DFAs without discriminator finalization [16]. This simple example demonstrates that algorithms for DFAs can be viewed as special cases of our s-FA learning algorithm for finite domains.

*Equality Algebra.* Consider the equality algebra defined in Example 1. Predicates in this algebra of size <sup>|</sup>φ<sup>|</sup> <sup>=</sup> <sup>k</sup> can be learned using 2<sup>k</sup> equivalence queries and no membership queries. Initially, the algorithm outputs the empty set ⊥ as a hypothesis. In any subsequent step, the algorithm keeps a list of the counterexamples obtained so far in two sets P, N <sup>⊆</sup> <sup>D</sup> such that <sup>P</sup> holds all the positive examples received so far and N holds all the negative examples. Afterwards, the algorithm finds the smallest hypothesis consistent with the counterexamples given. This hypothesis can be found efficiently as follows:

1. If <sup>|</sup>P<sup>|</sup> <sup>&</sup>gt; <sup>|</sup>N<sup>|</sup> then, <sup>φ</sup> <sup>=</sup> λc.¬( - <sup>d</sup>∈<sup>N</sup> <sup>c</sup> <sup>=</sup> <sup>d</sup>). 2. If <sup>|</sup>P|≤|N<sup>|</sup> then, <sup>φ</sup> <sup>=</sup> λc.( - <sup>d</sup>∈<sup>P</sup> <sup>c</sup> <sup>=</sup> <sup>d</sup>).

It can be easily shown that the algorithm will find a correct hypothesis after at most 2k equivalence queries.

*Other Algebras.* The following Boolean algebras can be efficiently learned using membership and equivalence queries. All these algebras also have approximate fingerprints [3], which means that they are not learnable by equivalence queries alone. Thus, s-FAs over these algebras are not efficiently learnable by previous s-FA learning algorithms [6,11].


#### **6 Evaluation**

We have implemented MAT <sup>∗</sup> in the open-source symbolicautomata library [1], as well as the learning algorithms for boolean algebras over finite domains, equality algebras and BDD algebras as discussed in Sect. 5. Our implementation is fully modular: Once an algebra learning algorithm is defined in our library, it can be seamlessly plugged in as a guard learning algorithm for s-FAs. Since MAT <sup>∗</sup> is also an algebra learning algorithm, this allows us to easily learn automata over automata. All experiments were ran in a Macbook air with an 1.8 GHz Intel Core i5 and 8 GiB of memory. The goal of our evaluation is to answer the following research questions:

**Q1:** How does MAT <sup>∗</sup> perform on automata over large finite alphabets? (Subsect. 6.1)


**Table 1.** Evaluation of MAT <sup>∗</sup> on regular expressions.


#### **6.1 Equality Algebra Learning**

In this experiment, we use MAT <sup>∗</sup> to learn s-FAs obtained from 15 regular expressions drawn from 3 domains: (1) Regular expressions used in web application sanitization frameworks such as in the CodeIgniter framework, (2) Regular expressions drawn from popular web application firewall ModSecurity and finally (3) Regular expressions from [18]. For this set of experiments we utilize as alphabet the entire UTF-16 (2<sup>16</sup> characters) and used the equality algebra to represent predicates. Since the alphabet is finite, we also tried learning the same automata using TTT [16], the most efficient algorithm for learning finite automata over finite alphabets.

*Results.* Table 1 presents the results of MAT <sup>∗</sup>. The **Memb** and **Equiv** columns present the number of distinct membership and equivalence queries respectively. The **R-CE** column shows how many times a counterexample was reused, while the **GU** column shows the number of counterexamples that were used to update an underlying predicate (as opposed to adding a new state in the s-FA). Finally, **D-CE** shows the number of counterexamples provided to an underlying algebra learner due to failed determinism checks, while **C-CE** shows the number of counterexamples due to failed completeness checks. Note that these counterexamples did not require invoking the equivalence oracle.

Given the large alphabet sizes, TTT runs out of memory on all our benchmarks. This is not surprising since the number of queries required by TTT just to construct the *correct* model for a DFA with 128 = 2<sup>7</sup> states is at least <sup>|</sup>Σ||Q<sup>|</sup> log <sup>|</sup>Q<sup>|</sup> = 2<sup>16</sup> <sup>∗</sup> <sup>2</sup><sup>7</sup> <sup>∗</sup> <sup>7</sup> <sup>≈</sup> <sup>2</sup><sup>26</sup>. We point out that a corresponding lower bound of <sup>Ω</sup>(|Q<sup>|</sup> log <sup>|</sup>Q||Σ|) exists for the number of queries any DFA algorithm may perform and therefore, the size of the alphabet provides a fundamental limitation for any such algorithm.

*Analysis.* First, we observe that the performance of the algorithm is not always monotone in the number of states or transitions of the s-FA. For example, RE.10 requires more than 10x more membership and equivalence queries than RE.7 despite the fact that both the number of states and transitions of RE.10 are smaller. In this case, RE.10 has fewer transitions, but they contain predicates that are harder to learn—e.g., large character classes. Second, the completeness check and the corresponding counterexamples are not only useful to ensure that the generated guards form a partition but also to restore predicates after new states are discovered. Recall that, once we discover (split) a new state, a number of learning instances is discarded. Usually, the newly created learning instances will simply output ⊥ as the initial hypothesis. At this point, completeness counterexamples are used to update the newly created hypothesis accordingly and thus save the MAT <sup>∗</sup> from having to rerun a large number of equivalence queries. Finally, we point out that the equality algebra learner made no special assumptions on the structure of the predicates such as recognizing character classes which are used in regular expressions and others. We expect that providing such heuristics can greatly improve the performance MAT <sup>∗</sup> in these benchmarks.

#### **6.2 BDD Algebra Learning**

In this experiment, we use MAT <sup>∗</sup> to learn s-FAs over a BDD algebra. We run MAT <sup>∗</sup> on 1,500 automata obtained by transforming Linear Temporal Logic over finite traces into s-FAs [9]. The formulas have 4 atomic propositions and the height in each BDD used by the s-FAs is four. To learn the underlying BDDs we use MAT <sup>∗</sup> with the learning algorithm for algebras over finite domains (see Sect. 5) since ordered BDDs can be seen as s-FAs over <sup>D</sup> <sup>=</sup> {0, <sup>1</sup>}.

Figure 4 shows the number of membership (top left) and equivalence (top right) queries performed by MAT <sup>∗</sup> for s-FAs with different number of states. For this s-FAs, MAT <sup>∗</sup> is highly efficient with respect to both the number of membership and equivalence queries, scaling linearly with the number of states. Moreover, we note that the size of the set of transitions <sup>|</sup>Δ<sup>|</sup> does not drastically affect the overall performance of the algorithm. This is in agreement with the results presented in the previous section, where we argued that the difficulty of the underlying predicates and not their number is the primary factor affecting performance.

**Fig. 4.** (Top) Evaluation of MAT <sup>∗</sup> on s-FAs over a BDD algebra. (Bottom) Evaluation of MAT <sup>∗</sup> on s-FAs over an s-FA algebra. For an s-FA Mm,n, the x-axis denotes the values of n. Different lines correspond to different values of m.

#### **6.3 s-FA Algebra Learning**

In this experiment, we use MAT <sup>∗</sup> to learn 18 s-FAs over s-FAs, which accept strings of strings. We evaluate the scalability of our algorithms when the difficulty of learning the underlying predicates increases. The possible internal s-FAs, which we will use as predicates, operate over the equality algebra and are denoted as <sup>I</sup><sup>k</sup> (where 2 <sup>≤</sup> <sup>k</sup> <sup>≤</sup> 17). Each s-FA <sup>I</sup><sup>k</sup> accepts exactly one word <sup>a</sup> ··· <sup>a</sup> of length k and has k + 1 states and 2k + 1 transitions. The external s-FAs are denoted as <sup>M</sup>m,n (where <sup>m</sup> ∈ {5, <sup>10</sup>, <sup>15</sup>} and 2 <sup>≤</sup> <sup>n</sup> <sup>≤</sup> 17). Each s-FA <sup>M</sup>m,n accepts exactly one word <sup>s</sup> ··· <sup>s</sup> of length <sup>m</sup> where each <sup>s</sup> is accepted by <sup>I</sup>n.

*Analysis.* For simplicity, let's assume that we have the s-FA Mn,n. Consider a membership query performed by one of the underlying algebra learning instances. Answering the membership query requires sifting a sequence in the classification tree of height at most n which requires O(n) membership queries. Therefore, the number of membership queries required to learn each individual predicate is increased by a factor of O(n). Moreover, for each equivalence query performed by an algebra learning instance, the s-FA learning algorithm has to pinpoint the counterexample to the specific algebra learning instance, a process which requires log m membership queries, where m is the length of the counterexample. Therefore, we conclude that each underlying guard with n states will require a number of membership queries which is of the order of O(n<sup>3</sup>) at the worst and O(n<sup>2</sup> log n) queries at the best (since the CT has height Ω(log n)), ignoring the queries required for counterexample processing.

Figure 4 shows the number of membership (bottom left) and equivalence (bottom right) queries, which verify the theoretical analysis presented in the previous paragraph. Indeed, we see that in terms of membership queries, we have a very sharp increase in the number of membership queries which is in fact about quadratic in the number of states in the underlying guards. On the other hand, equivalence queries are not affected so drastically, and only increase linearly.

#### **7 Related Work**

*Learning Finite Automata.* The L<sup>∗</sup> algorithm proposed by Dana Angluin [3] was the first to introduce the notion of minimally adequate teacher—i.e., learning using membership and equivalence queries—and was also the first for learning finite automata in polynomial time. Following Angluin's result, L<sup>∗</sup> has been studied extensively [16,17], it has been extended to many other models—e.g., to nondeterministic automata [12] alternating automata [4]—and has found many applications in program analysis [2,5–7,24] and program synthesis [25]. Since finite automata only operate over finite alphabets, all the automata that can be learned using variants of L∗, can also be learned using MAT <sup>∗</sup>.

*Learning Symbolic Automata.* The problem of scaling L<sup>∗</sup> to large alphabets was initially studied outside the setting of s-FAs using alphabet abstractions [14,15]. The first algorithm for symbolic automata over ordered alphabets was proposed in [20] but the algorithm assumes that the counterexamples provided to the learning algorithm are of minimal length. Argyros et al. [6] proposed the first algorithm for learning symbolic automata in the standard MAT model and also described the algorithm to distinguish counterexamples leading to new states from counterexamples due to invalid predicates which we adapt in MAT <sup>∗</sup> . Drews and D'Antoni [11] proposed a symbolic extension to the L∗algorithm, gave a general definition of learnability and demonstrated more learnable algebras such as union and product algebras. The algorithms in [6,11,19] are all extensions of L<sup>∗</sup> and assume the existence of an underlying learning algorithm capable of learning partitions of the domain from counterexamples. MAT <sup>∗</sup> does not require that the predicate learning algorithms are able to learn partitions, thus allowing to easily plug existing learning algorithms for Boolean algebras. Moreover, MAT <sup>∗</sup> allows the underlying algebra learning algorithms to perform both equivalence and membership queries, a capability not present in any previous work, thus expanding the class of s-FAs which can be efficiently learned.

*Learning Other Models.* Argyros et al. [6] and Botincan et al. [7] presented algorithms for learning restricted families of symbolic transducers—i.e., symbolic automata with outputs. Other algorithms can learn nominal [21] and register automata [8]. In these models, the alphabet is infinite but not structured (i.e., it does not form a Boolean algebra) and characters at different positions can be compared using binary relations.

**Acknowledgements.** The authors would like to thank the anonymous reviewers for their valuable comments. Loris D'Antoni was supported by National Science Foundation Grants CCF-1637516, CCF-1704117 and a Google Research Award. George Argyros was supported by the Office of Naval Research (ONR) through contract N00014- 12-1-0166.

#### **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

# Runtime Verification, Hybrid and Timed Systems

### **Reachable Set Over-Approximation for Nonlinear Systems Using Piecewise Barrier Tubes**

Hui Kong1(B), Ezio Bartocci<sup>2</sup>, and Thomas A. Henzinger<sup>1</sup>

<sup>1</sup> IST Austria, Klosterneuburg, Austria hui.kong@ist.ac.at <sup>2</sup> TU Wien, Vienna, Austria

**Abstract.** We address the problem of analyzing the reachable set of a polynomial nonlinear continuous system by over-approximating the flowpipe of its dynamics. The common approach to tackle this problem is to perform a numerical integration over a given time horizon based on Taylor expansion and interval arithmetic. However, this method results to be very conservative when there is a large difference in speed between trajectories as time progresses. In this paper, we propose to use combinations of barrier functions, which we call piecewise barrier tube (PBT), to overapproximate flowpipe. The basic idea of PBT is that for each segment of a flowpipe, a coarse box which is big enough to contain the segment is constructed using sampled simulation and then in the box we compute by linear programming a set of barrier functions (called barrier tube or BT for short) which work together to form a tube surrounding the flowpipe. The benefit of using PBT is that (1) BT is independent of time and hence can avoid being stretched and deformed by time; and (2) a small number of BTs can form a tight over-approximation for the flowpipe, which means that the computation required to decide whether the BTs intersect the unsafe set can be reduced significantly. We implemented a prototype called PBTS in C++. Experiments on some benchmark systems show that our approach is effective.

#### **1 Introduction**

Hybrid systems [17] are widely used to model dynamical systems which exhibit both discrete and continuous behaviors. The reachability analysis of hybrid systems has been a challenging problem over the last few decades. The hard core of this problem lies in dealing with the continuous behavior of systems that are described by ordinary differential equations (ODEs). Although there are currently several quite efficient and scalable approaches for reachability analysis of linear systems [8–10,14,16,19,20,26,34], nonlinear ODEs are much harder

This research was supported by the Austrian Science Fund (FWF) under grants S11402-N23, S11405-N23 (RiSE/SHiNE) and Z211-N23 (Wittgenstein Award).

c The Author(s) 2018

H. Chockler and G. Weissenbacher (Eds.): CAV 2018, LNCS 10981, pp. 449–467, 2018. https://doi.org/10.1007/978-3-319-96145-3\_24

to handle and the current approaches can be characterized into the following groups.

*Invariant Generation* [18,21,22,27,28,36,37,39]. An invariant I for a system S is a set such that any trajectory of S originating from I never escapes from I. Therefore, finding an invariant I such that the initial set I<sup>0</sup> ⊆ I and the unsafe set U ∩ I = ∅ indicates the safety of the system. In this way, there is no need to compute the flowpipe. The main problem with invariant generation is that it is hard to define a set of high quality constraints which can be solved efficiently.

*Abstraction and Hybridization* [2,11,24,31,35]. The basic idea of the abstractionbased approach is first constructing a linear model which over-approximates the original nonlinear dynamics and then applying techniques for linear systems to the abstraction model. However, how to construct an abstraction with the fewest discrete states and sufficiently high accuracy is still a challenging issue.

*Satisfiability Modulo Theory (SMT) Over Reals* [6,7,23]. This approach encodes the reachability problem for nonlinear systems as first-order logic formulas over the real numbers. These formulas can be solved using for example δ−complete decision procedures that overcome the theoretical limits in nonlinear theories over the reals, by choosing a desired precision δ. An SMT implementing such procedures can return either unsat if the reachability problem is unsatisfiable or δ-sat if the problem is satisfiable given the chosen precision. The δ-sat verdict does not guarantee that the dynamics of the system will reach a particular region. It may happens that by increasing the precision the problem would result unsat. In general the limit of this approach is that it does not provide as a result a complete and comprehensive description of the reachability set.

*Bounded Time Flowpipe Computation* [1,3–5,25,32]. The common technique to compute a bounded flowpipe is based on interval method or Taylor model. Interval-based approach is quite efficient even for high dimensional systems [29], but it suffers the wrapping effect of intervals and can quickly accumulate overapproximation errors. In contrast, the Taylor-model-based approach is more precise in that it uses a vector of polynomials plus a vector of small intervals to symbolically represent the flowpipe. However, for the purpose of safety verification or reachability analysis, the Taylor model has to be further over-approximated by intervals, which may bring back the wrapping effect. In particular, the wrapping effect can explode easily when the flowpipe segment over a time interval is stretched drastically due to a large difference in speed between individual trajectories. This case is demonstrated by the following example.

*Example 1 (Running example).* Consider the 2D system [30] described by ˙x = y and ˙<sup>y</sup> <sup>=</sup> <sup>x</sup><sup>2</sup>. Let the initial set *<sup>X</sup><sup>0</sup>* be a line segment <sup>x</sup> <sup>∈</sup> [1.0, <sup>1</sup>.0] and <sup>y</sup> <sup>∈</sup> [−1.05, −0.95], Fig. 1a shows the simulation result on three points in *X<sup>0</sup>* over time interval [0, 6.6]. The reachable set at t = 6.6 s is a smooth curve connecting the end points of the three trajectories. As can be seen, the trajectory originating from the top is left far behind the one originating from the bottom, which means that the tiny initial line segment is being stretched into a huge curve very quickly,

**Fig. 1.** (a) Simulation for Example 1 showing flowpipe segment being extremely stretched and deformed, (b) Interval over-approximation of the Taylor model computed by *Flow\** [3].

while the width of the flowpipe is actually converging to 0. As a result, the interval over-approximation of this huge curve can be extremely conservative even if its Taylor model representation is precise, and reducing the time step size is not helpful. To prove this point, we computed with *Flow\** [3] a Taylor model series for the time horizon of 6.6 s which consists of 13200 Taylor models. Figure 1b shows the interval approximation of the Taylor model series, which apparently starts exploding.

In this paper, we propose to use piecewise barrier tubes (PBTs) to overapproximate flowpipes of polynomial nonlinear systems, which can avoid the issue caused by the excessive stretching of a flowpipe segment. The idea of PBT is inspired from barrier certificate [22,33]. A barrier certificate B(*x*) is a realvalued function such that (1) B(*x*) ≥ 0 for all *x* in the initial set *X<sup>0</sup>* ; (2) B(*x*) < 0 for all *x* in the unsafe set *X<sup>U</sup>* ; (3) no trajectory can escape from {*<sup>x</sup>* <sup>∈</sup> <sup>R</sup><sup>n</sup> <sup>|</sup> <sup>B</sup>(*x*) <sup>≥</sup> <sup>0</sup>} through the boundary {*<sup>x</sup>* <sup>∈</sup> <sup>R</sup><sup>n</sup> <sup>|</sup> <sup>B</sup>(*x*)=0}. A sufficient condition for this constraint is that the Lie derivative of B(*x*) w.r.t the dynamics *x*˙ = *f* is positive all over the invariant region, i.e., L*<sup>f</sup>* B(*x*) > 0, which means that all the trajectories must move in the increasing direction of the level sets of B(*x*).

Barrier certificates can be used to verify safety properties without computing the flowpipe explicitly. The essential idea is to use the zero level set of B(*x*) as a barrier to separate the flowpipe from the unsafe set. Moreover, if the unsafe set is very close to the boundary of the flowpipe, the barrier has to fit the shape of the flowpipe to make sure that all components of the constraint are satisfied. However, the zero level set of a polynomial of fixed degree may not have the power to mimic the shape of the flowpipe, which means that there may exist no solution for the above constraints even if the system is safe. This problem might be addressed using piecewise barrier certificate, i.e., cutting the flowpipe into small pieces so that every piece is straight enough to have a barrier certificate of simple form. Unfortunately, this is infeasible because we know nothing about the flowpipe locally. Therefore, we have to find another way to proceed.

Instead of computing a single barrier certificate, we propose to compute barrier tubes to piecewise over-approximate the flowpipe. Concretely, in the beginning, we first construct a containing box, called **enclosure**, for the initial set using interval approach [29] and simulation, then, using linear programming, we compute a group of barrier functions which work together to form a tight tube (called barrier tube) around the flowpipe. Similarly, taking the intersection of the barrier tube and the boundary of the box as the new initial set, we repeat the previous operations to obtain successive barrier tubes step by step. The key point here is how to compute a group of tightly enclosing barriers around the flowpipe without a constraint on the unsafe set inside the box. Our basic idea is to construct a group of auxiliary state sets U around the flowpipe and then, for each U<sup>i</sup> ∈ U, we compute a barrier certificate between U<sup>i</sup> and the flowpipe. If a barrier certificate is found, we expand U<sup>i</sup> towards the flowpipe iteratively until no more barrier certificate can be found; otherwise, we shrink U<sup>i</sup> away from the flowpipe until a barrier certificate is found. Since the auxiliary sets are distributed around the flowpipe, so is the barrier tube. The benefit of such piecewise barrier tubes is that they are time independent, and hence can avoid the issue of stretched flowpipe segments caused by speed differences between trajectories. Moreover, usually a small number of BTs can form a tight overapproximation of the flowpipe, which means that less computation is needed to decide the intersection of PBT and the unsafe set.

The main contributions of this paper are as follows:


The paper is organized as follows. Section 2 is devoted to the preliminaries. Section 3 shows how to compute barrier certificates using Handelman representation, while in Sect. 4 we present a method to compute Piecewise Barrier Tubes. Section 5 provides our experimental results and we conclude in Sect. 6.

#### **2 Preliminaries**

In this section, we recall some concepts used throughout the paper. We first clarify some notation conventions. If not specified otherwise, we use boldface lower case letters to denote vectors, we use R for the real number field and N for the set of natural numbers, and we consider multivariate polynomials in R[*x*], where the components of *x* act as indeterminates. In addition, for all the polynomials B(*u*, *x*), we denote by *u* the vector composed of all the u<sup>i</sup> and denote by *x* the vector composed of all the remaining variables x<sup>i</sup> that occur in the polynomial. We use <sup>R</sup>≥<sup>0</sup> and <sup>R</sup>><sup>0</sup> to denote the domain of nonnegative real number and positive real number respectively.

Let <sup>P</sup> <sup>⊆</sup> <sup>R</sup><sup>n</sup> be a convex and compact polyhedron with non-empty interior, bounded by linear polynomials <sup>p</sup>1, ··· , p<sup>m</sup> <sup>∈</sup> <sup>R</sup>[*x*]. Without lose of generality, we may assume <sup>P</sup> <sup>=</sup> {*<sup>x</sup>* <sup>∈</sup> <sup>R</sup><sup>n</sup> <sup>|</sup> <sup>p</sup>i(*x*) <sup>≥</sup> <sup>0</sup>, i = 1, ··· , m}.

Next, we present the notation of the Lie derivative, which is widely used in the discipline of differential geometry. Let *<sup>f</sup>* : <sup>R</sup><sup>n</sup> <sup>→</sup> <sup>R</sup><sup>n</sup> be a continuous vector field such that ˙x<sup>i</sup> = fi(*x*) where ˙x<sup>i</sup> is the time derivative of xi(t).

**Definition 1 (Lie derivative).** *For a given polynomial* <sup>p</sup> <sup>∈</sup> <sup>R</sup>[*x*] *over <sup>x</sup>* <sup>=</sup> (x1,...,xn) *and a continuous system x*˙ = *f, where f* = (f1,...,fn)*, the Lie derivative of* <sup>p</sup> <sup>∈</sup> <sup>R</sup>[*x*] *along <sup>f</sup> of order* <sup>k</sup> *is defined as follows.*

$$\mathcal{L}\_f^k p \stackrel{def}{=} \begin{cases} p, & k=0\\ \sum\_{i=1}^n \frac{\partial \mathcal{L}\_f^{k-1} p}{\partial x\_i} \cdot f\_i, & k \ge 1 \end{cases}$$

Essentially, the k-th order Lie derivative of p is the k-th derivative of p w.r.t. time, i.e., reflects the change of <sup>p</sup> over time. We write <sup>L</sup>*<sup>f</sup>* <sup>p</sup> for <sup>L</sup><sup>1</sup> *<sup>f</sup>* p.

In this paper, we focus on semialgebraic nonlinear systems, which are defined as follows.

**Definition 2 (Semialgebraic system).** *A semialgebraic system is a triple* <sup>M</sup> *def* = X, *f*, *X<sup>0</sup>* , I *, where*


The local Lipschitz continuity guarantees the existence and uniqueness of the differential equation *x***˙** = *f* locally. A trajectory of a semialgebraic system is defined as follows.

**Definition 3 (Trajectory).** *Given a semialgebraic system* M*, a trajectory originating from a point x*<sup>0</sup> ∈ *X<sup>0</sup> to time* T > 0 *is a continuous and differentiable function <sup>ζ</sup>*(*x*0, t) : [0, T) <sup>→</sup> <sup>R</sup><sup>n</sup> *such that (1) <sup>ζ</sup>*(x0, 0) = *<sup>x</sup>*<sup>0</sup> *, and (2)* <sup>∀</sup><sup>τ</sup> <sup>∈</sup> [0, T)*:* <sup>d</sup>*<sup>ζ</sup>* dt <sup>t</sup>=<sup>τ</sup> = *f*(*ζ*(*x*0, τ ))*.* T *is assumed to be within the maximal interval of existence of the solution from x*0*.*

For ease of readability, we also use ζ(t) for ζ(*x*0, t). In addition, we use Flow*<sup>f</sup>* (*X<sup>0</sup>* ) to denote the flowpipe of initial set *X<sup>0</sup>* , i.e.,

$$Flow\_f(X\_0) \stackrel{\text{def}}{=} \{ \boldsymbol{\zeta}(\boldsymbol{x}\_0, t) \mid \boldsymbol{x}\_0 \in X\_0, t \in \mathbb{R}\_{\geq}, \dot{\boldsymbol{\zeta}} = \boldsymbol{f}(\boldsymbol{\zeta}) \}\tag{1}$$

**Definition 4 (Safety).** *Given an unsafe set X<sup>U</sup>* ⊆ X*, a semialgebraic system* M = X, *f*, *X<sup>0</sup>* , I *is said to be safe if no trajectory ζ*(*x*0, t) *of* M *satisfies that* <sup>∃</sup><sup>τ</sup> <sup>∈</sup> <sup>R</sup>≥<sup>0</sup> : *<sup>x</sup>*(<sup>τ</sup> ) <sup>∈</sup> *<sup>X</sup><sup>U</sup> , where <sup>x</sup>*<sup>0</sup> <sup>∈</sup> *<sup>X</sup><sup>0</sup> .*

#### **3 Computing Barrier Certificates**

Given a semialgebraic system M, a barrier certificate is a real-valued function B(*x*) such that (1) B(*x*) ≥ 0 for all *x* in the initial set; (2) B(*x*) < 0 for all *x* in the unsafe set; (3) no trajectory can escape from the region of B(*x*) ≥ 0. Then, the hyper-surface {*<sup>x</sup>* <sup>∈</sup> <sup>R</sup><sup>n</sup> <sup>|</sup> <sup>B</sup>(*x*)=0} forms a barrier separating the flowpipe from the unsafe set. To compute such a barrier certificate, the most common approach is template based constraint solving, i.e., firstly figure out a sufficient condition for the above condition and then, set up a template polynomial B(*u*, *x*) of fixed degree, and finally solve the constraint on *u* derived from the sufficient condition on B(*u*, *x*). There are a couple of sufficient conditions available for this purpose [13,22,27]. In order to have an efficient constraint solving method, we adopt the following condition [33].

**Theorem 1.** *Given a semialgebraic system* M*, let X<sup>0</sup> and* U *be the initial set and the unsafe set respectively, the system is guaranteed to be safe if there exists a real-valued function* B(*x*) *such that*

$$\forall x \in X\_0 \,:\, B(x) > 0 \tag{2}$$

$$\forall \mathbf{x} \in I : \mathcal{L}\_f B > 0 \tag{3}$$

$$\forall x \in X\_U: B(x) < 0 \tag{4}$$

In Theorem 1, the condition (3) means that all the trajectories of the system always point in the increasing direction of the level sets of B(*x*) in the region I. Therefore, no trajectory starting from the initial set would cross the zero level set. The benefit of this condition is that it can be solved more efficiently than other existing conditions [13,22] although it is relatively conservative. The most widely used approach is to transform the constraint-solving problem into a sumof-squares (*SOS*) programming problem [33], which can be solved in polynomial time. However, a serious problem with *SOS* programming based approach is that automatic generation of polynomial templates is very hard to perform. We now show an example to demonstrate the reason. For simplicity, we assume that the initial set, the unsafe set and the invariant are defined by the polynomial inequalities *X<sup>0</sup>* (*x*) ≥ 0, *X<sup>U</sup>* (*x*) ≥ 0 and I(*x*) ≥ 0 respectively, then the *SOS* relaxation of Theorem 1 is that the following polynomials are all *SOS*

$$B(\mathbf{z}) - \mu\_1(\mathbf{z})X\_0(\mathbf{z}) + \epsilon\_1 \tag{5}$$

$$
\mathcal{L}\_f B - \mu\_2(\mathbf{z}) I(\mathbf{z}) + \epsilon\_2 \tag{6}
$$

$$-B(\mathbf{z}) - \mu\_3(\mathbf{z})X\_U(\mathbf{z}) + \epsilon\_3 \tag{7}$$

where μi(*x*), i = 1, ··· , 3 are *SOS* polynomials as well and <sup>i</sup> > 0, i = 1, ··· , 3. Suppose the degrees of *X<sup>0</sup>* (*x*), I(*x*) and *X<sup>U</sup>* (*x*) are all odd numbers. Then, the degree of the template for B(*x*) must be an odd number too. The reason is that, if deg(B) is an even number, in order for the first and third polynomials to be *SOS* polynomials, deg(B) must be greater than both deg(μ3*X<sup>U</sup>* ) and deg(μ1*X<sup>0</sup>* ), which are odd numbers. However, since the first and third condition contain B(*x*) and −B(*x*) respectively, their leading monomials must have the opposite sign, which means that they cannot be *SOS* polynomial simultaneously. Moreover, the degrees of the templates for the auxiliary polynomials μ1(*x*), μ3(*x*) must also be chosen properly so that deg(μ1*X<sup>0</sup>* ) = deg(μ3*X<sup>U</sup>* ) = deg(B), because only in this way the leading monomials (which has an odd degree) of (5) and (7) have the chance to be resolved so that the resultant polynomial can be a *SOS*. Similarly, in order to make the second polynomial a *SOS* as well, one has to choose an appropriate degree for μ2(*x*) according to the degree of LfB and I(*x*). As a result, the tangled constraints on the relevant template polynomials reduce the power of *SOS* programming significantly.

Due to the above reason, inspired by the work [38], we use Handelman representation to relax Theorem 1. We assume that the initial set *X<sup>0</sup>* , the unsafe set *X<sup>U</sup>* and the invariant I are all convex and compact polyhedra, i.e., *X<sup>0</sup>* = {*x* ∈ <sup>R</sup><sup>n</sup> <sup>|</sup> <sup>p</sup>1(*x*) <sup>≥</sup> <sup>0</sup>, ··· , p<sup>m</sup><sup>1</sup> (*x*) <sup>≥</sup> <sup>0</sup>}, <sup>I</sup> <sup>=</sup> {*<sup>x</sup>* <sup>∈</sup> <sup>R</sup><sup>n</sup> <sup>|</sup> <sup>q</sup>1(*x*) <sup>≥</sup> <sup>0</sup>, ··· , q<sup>m</sup><sup>2</sup> (*x*) <sup>≥</sup> <sup>0</sup>} and *<sup>X</sup><sup>U</sup>* <sup>=</sup> {*<sup>x</sup>* <sup>∈</sup> <sup>R</sup><sup>n</sup> <sup>|</sup> <sup>r</sup>1(*x*) <sup>≥</sup> <sup>0</sup>, ··· , r<sup>m</sup><sup>3</sup> (*x*) <sup>≥</sup> <sup>0</sup>}, where <sup>p</sup>i(*x*), q<sup>j</sup> (*x*), rk(*x*) are linear polynomials. Then, we have the following theorem.

**Theorem 2.** *Given a semialgebraic system* M*, let X<sup>0</sup> , X<sup>U</sup> and* I *be defined as above, the system is guaranteed to be safe if there exists a real-valued polynomial function* B(*x*) *such that*

$$B(\mathbf{z}) \equiv \sum\_{|\alpha| \le M\_1} \lambda\_\alpha p\_1^{\alpha\_1} \cdots p\_{m\_1}^{\alpha\_{m\_1}} + \epsilon\_1 \tag{8}$$

$$\mathcal{L}\_f B \equiv \sum\_{|\beta| \le M\_2} \lambda\_\beta q\_1^{\beta\_1} \cdots q\_{m\_2}^{\beta\_{m\_2}} + \epsilon\_2 \tag{9}$$

$$-B(x) \equiv \sum\_{|\gamma| \le M\_3} \lambda\_\gamma r\_1^{\gamma\_1} \cdots r\_{m\_3}^{\gamma\_{m\_3}} + \epsilon\_3 \tag{10}$$

*where* <sup>λ</sup>*<sup>α</sup>* , λ*<sup>β</sup>* , λ*<sup>γ</sup>* <sup>∈</sup> <sup>R</sup>≥0*,* <sup>i</sup> <sup>∈</sup> <sup>R</sup><sup>&</sup>gt;<sup>0</sup> *and* <sup>M</sup><sup>i</sup> <sup>∈</sup> <sup>N</sup>, i = 1, ··· , <sup>3</sup>*.*

Theorem 2 provides us with an alternative to *SOS* programming to find barrier certificate B(*x*) by transforming it into a linear programming problem. The basic idea is that we first set up a template B(*u*, *x*) of fixed degree as well as the appropriate Mi, i = 1, ··· , 3 that make the both sides of the three identities (8)–(10) have the same degree. Since (8)–(10) are identities, the coefficients of the corresponding monomials on both sides must be identical as well. Thus, we derive a system S of linear equations and inequalities over *u*, λ*<sup>α</sup>* , λ*<sup>β</sup>* , λ*<sup>γ</sup>* . Now, finding a barrier certificate is just to find a feasible solution for S, which can be solved by linear programming. Compared to *SOS* programming based approach, this approach is more flexible in choosing the polynomial template as well as other parameters. We consider now a linear system to show how it works.

*Example 2.* Given a 2D system defined by ˙x = 2x + 3y, y˙ = −4x + 2y, let *<sup>X</sup><sup>0</sup>* <sup>=</sup> {(x, y) <sup>∈</sup> <sup>R</sup><sup>2</sup> <sup>|</sup> <sup>p</sup><sup>1</sup> <sup>=</sup> <sup>x</sup> + 100 <sup>≥</sup> <sup>0</sup>, p<sup>2</sup> <sup>=</sup> <sup>−</sup><sup>90</sup> <sup>−</sup> <sup>x</sup> <sup>≥</sup> <sup>0</sup>, p<sup>3</sup> <sup>=</sup> <sup>y</sup> + 45 <sup>≥</sup> <sup>0</sup>, p<sup>4</sup> <sup>=</sup> <sup>−</sup><sup>40</sup> <sup>−</sup> <sup>y</sup> <sup>≥</sup> <sup>0</sup>}, <sup>I</sup> <sup>=</sup> {(x, y) <sup>∈</sup> <sup>R</sup><sup>2</sup> <sup>|</sup> <sup>q</sup><sup>1</sup> <sup>=</sup> <sup>x</sup> + 110 <sup>≥</sup> <sup>0</sup>, q<sup>2</sup> <sup>=</sup> <sup>−</sup><sup>80</sup> <sup>−</sup> <sup>x</sup> <sup>≥</sup> <sup>0</sup>, q<sup>3</sup> <sup>=</sup> <sup>y</sup> + 45 <sup>≥</sup> <sup>0</sup>, q<sup>4</sup> <sup>=</sup> <sup>−</sup><sup>20</sup> <sup>−</sup> <sup>y</sup> <sup>≥</sup> <sup>0</sup>} and *<sup>X</sup><sup>U</sup>* <sup>=</sup> {(x, y) <sup>∈</sup> <sup>R</sup><sup>2</sup> <sup>|</sup> <sup>r</sup><sup>1</sup> <sup>=</sup> <sup>x</sup> + 98 <sup>≥</sup> <sup>0</sup>, r<sup>2</sup> <sup>=</sup>

**Fig. 2.** (a) Linear barrier certificate (straight red line) for Example 2. Rectangle in green: initial set, rectangle in red: unsafe set. (b) PBT for the running Example 5, consisting of 45 BTs. (c) Enclosure (before bloating) for flowpipe of Example 3 (green shadow region). (d) Enclosure (after bloating) for flowpipe of Example 3. (Color figure online)

−90−x ≥ 0, r<sup>3</sup> = y+24 ≥ 0, r<sup>4</sup> = −20−y ≥ 0}. Assume B(*u*, *x*) = u1+u2x+u3y, M<sup>i</sup> = <sup>i</sup> = 1 for i = 1, ··· , 3, then we obtain the following polynomial identities according to Theorem 2

$$\begin{aligned} u\_1 + u\_2x + u\_3y - \sum\_{i=1}^4 \lambda\_{1i} p\_i - \epsilon\_1 &\equiv 0\\ u\_2(2x + 3y) + u\_3(-4x + 2y) - \sum\_{j=1}^4 \lambda\_{2j} q\_j - \epsilon\_2 &\equiv 0\\ -\left(u\_1 + u\_2x + u\_3y\right) - \sum\_{k=1}^4 \lambda\_{3k} r\_k - \epsilon\_3 &\equiv 0 \end{aligned}$$

where λij ≥ 0 for i = 1, ··· , 3, j = 1, ··· , 4. By collecting the coefficients of x, y in the above polynomials, we obtain a system S of linear polynomial equations and inequalities over ui, λjk. By solving S using linear programming, we obtain a feasible solution and Fig. 2a shows the computed linear barrier certificate. Note that, for the aforementioned reason, it is impossible to find a linear barrier certificate using *SOS* programming for this example.

#### **4 Piecewise Barrier Tubes**

In this section, we introduce how to construct PBTs for nonlinear polynomial systems. The basic idea of constructing PBT is that, for each segment of the flowpipe, an enclosure box is first constructed and then, a BT is constructed to form a tighter over-approximation for the flowpipe segment inside the box.

#### **4.1 Constructing an Enclosure Box**

Given an initial set, the first task is to construct an enclosure box for the initial set and the following segment of the flowpipe. As pointed out in Sect. 1, one principle to construct an enclosure box is to simplify the shape of the flowpipe segment, or in other words, to approximately bound the twisting of trajectories by some θ in the box, where the *twisting* of a trajectory is defined as follows.

**Definition 5 (Twisting of a trajectory).** *Let* M *be a continuous system and* ζ(t) *be a trajectory of* M*. Then,* ζ(t) *is said to have a twisting of* θ *on the time interval* I = [T1, T2]*, written as* ξ<sup>I</sup> (ζ)*, if it satisfies that* ξ<sup>I</sup> (ζ) = θ*, where* <sup>ξ</sup><sup>I</sup> (ζ) *def* = supt1,t2∈<sup>I</sup> arccos ˙ ζ(t1), ˙ <sup>ζ</sup>(t2) ζ(t1)ζ(t2) *.*

The basic idea to construct an enclosure box is depicted in Algorithm 1.


**input** : M: dynamics of the system; n: dimension of system; *X<sup>0</sup>* : initial set θ1: twisting of simulation; d: maximum distance of simulation; **output**: E: an enclosure box containing *X<sup>0</sup>* ; P: plane where flowpipe exits ; <sup>G</sup>: range of intersection of F low*f* (*X<sup>0</sup>* ) with plane <sup>P</sup> by simulation


```
3 find a time step size ΔT0 by (θ, d)-bounded simulation for x0;
```

**<sup>6</sup>** [found, <sup>E</sup>] ←− find an enclosure box by interval arithmetic using ΔT;


**<sup>9</sup>** bloat <sup>E</sup> s.t F low*f* (*X<sup>0</sup>* ) gets out of <sup>E</sup> only through the facet in <sup>P</sup>;

**10** break;

**11 else**

```
12 ΔT ←− 1/2 ∗ ΔT;
```
*Remark 1.* In Algorithm 1, we use interval arithmetic [29] and simulation to construct an enclosure box E for a given initial set and its following flowpipe segment. Meanwhile, we obtain a coarse range of the intersection of the flowpipe and the boundary of the enclosure, which helps to accelerate the construction of barrier tube. To be simple, the enclosure is constructed in a way such that the flowpipe gets out of the box through a single facet. Given an initial set *X<sup>0</sup>* , we first sample a set S<sup>0</sup> of points from *X<sup>0</sup>* for simulation. Then, we select a point *x*<sup>0</sup> from S<sup>0</sup> and do (θ, d)-simulation on *x*<sup>0</sup> to obtain a time step ΔT.A(θ, d) simulation is a simulation that stops either when the twisting of the simulation reaches θ or when the distance between x<sup>0</sup> and the end point reaches d. On the one hand, by using a small θ, we aim to achieve a straight flowpipe segment. On the other hand, by specifying a maximal distance d, we make sure that the simulation can stop for a long and straight flowpipe. At each iteration of the *while* loop in line 5, we first try to construct an enclosure box by interval arithmetic over ΔT. If such an enclosure box is created, we then perform a simulation (see line 8) for all the points in S<sup>0</sup> to find out the plane P of facet which intersects with the most of the simulations. The idea behind line 9 is that in order to better over-approximate the intersection of the flowpipe with the boundary of the box using intervals, we push the other planes outwards to make P the only plane where the flowpipe get out of the box. Certainly, simply by simulation we cannot guarantee that the flowpipe does not intersect the other facets. Therefore, we have the following theorem for the decision.

**Theorem 3.** *Given a semialgebraic system* M *and an initial set X<sup>0</sup> , a box* E *is an enclosure of X<sup>0</sup> and* F<sup>i</sup> *is a facet of* E*. Then,* (Flow<sup>f</sup> (*X<sup>0</sup>* ) ∩ E) ∩ F<sup>i</sup> = ∅ *if there exists a barrier certificate* Bi(*x*) *for X<sup>0</sup> and* F<sup>i</sup> *inside* E*.*

*Remark 2.* According to the definition of barrier certificate, the proof of Theorem 3 is straightforward, which is ignored here. Therefore, to make sure that the flowpipe does not intersect the facet Fi, we only need to find a barrier certificate, which can be done using the approach presented in Sect. 3. Moreover, if no barrier certificate can be found, we further bloat the facet. Next, we still use the running Example 1 to demonstrate the process of constructing an enclosure.

*Example 3 (running example).* Consider the system in Example 1 and the initial set x = 1.0, −1.05 ≤ y ≤ −0.95, let the bounding twisting of simulation be θ = π/18, then the time step size we computed for interval evaluation is ΔT = 0.2947. The corresponding enclosure computed by interval arithmetic is shown in Fig. 2c. Furthermore, by simulation, we know that the flowpipe can reach both left facet and top facet. Therefore, we have two options to bloat the facet: bloat the left facet to make the flowpipe intersects the top facet only or bloat the top facet to make the flowpipe intersects left facet only. In this example, we choose the latter option and the bloated enclosure is shown in Fig. 2d. In this way, we can over-approximate the intersection of the flowpipe and the facet by intervals if we can obtain its boundary on every side. This can be achieved by finding barrier tube.

#### **4.2 Compute a Barrier Tube Inside a Box**

An important fact about the flowpipe of continuous system is that it tends to be straight if it is short enough, given that the initial set is straight as well (otherwise, we can split it). Suppose there is a small box E around a straight flowpipe, it will be easy to compute a barrier certificate for a given initial set and unsafe set inside E. A barrier tube for the flowpipe in E is a group of barrier certificates which form a tube around a flowpipe inside E. Formally,

**Definition 6 (Barrier Tube).** *Given a semialgebraic system* M*, a box* E *and an initial set X<sup>0</sup>* ⊆ E*, a barrier tube is a set of real-valued functions* BT = {Bi(*x*), i = 1, ··· , m} *such that for all* Bi(*x*) ∈ BT*: (1)* ∀*x* ∈ *X<sup>0</sup>* : Bi(*x*) > 0 *and, (2)* ∀*x* ∈ E : L<sup>f</sup>B<sup>i</sup> > 0*.*

According to Definition 6, a barrier tube BT is defined by a set of real-valued functions and every function inequality Bi(*x*) > 0 is an invariant of M in E and so do their conjunction. The property of a barrier tube BT is formally described in the following theorem.

**Theorem 4.** *Given a semialgebraic system* M*, a box* E *and an initial set X<sup>0</sup>* ⊆ <sup>E</sup>*, let* BT <sup>=</sup> {Bi(*x*) : <sup>i</sup> = 1, ··· , m} *be a barrier tube of* <sup>M</sup> *and* <sup>Ω</sup> <sup>=</sup> {*<sup>x</sup>* <sup>∈</sup> <sup>R</sup><sup>n</sup> <sup>|</sup> Bi(*x*) > 0, B<sup>i</sup> ∈ BT}*, then* Flow*<sup>f</sup>* (*X<sup>0</sup>* ) ∩ E ⊆ Ω ∩ E*.*

*Remark 3.* Theorem 4 states that an arbitrary barrier tube is able to form an over-approximation for the reach pipe in the box E. Compared to a single barrier certificate, multiple barrier certificates could over-approximate the flowpipe more precisely. However, since there is no constraint on unsafe sets in Definition 6, a barrier tube satisfying the definition could be very conservative. In order to obtain an accurate approximation for the flowpipe, we choose to create additional auxiliary constraints.

**Auxiliary Unsafe Set (AUS).** To obtain an accurate barrier tube, there are two main questions to be answered: (1) How many barrier certificates are needed? and (2) How do we control their positions to make the tube well-shaped to better over-approximate the flowpipe? The answer for the first question is quite simple: the more, the better. This will be explained later on. For the second question, the answer is to construct a group of properly distributed auxiliary state sets (AUSs). Each set of the AUSs is used as an unsafe set U<sup>i</sup> for the system and then we compute a barrier certificate B<sup>i</sup> for U<sup>i</sup> according to Theorem 2. Since the zero level set of B<sup>i</sup> serves as a barrier between the flowpipe and Ui, the space where a barrier could appear is fully determined by the position of Ui. Roughly speaking, when U<sup>i</sup> is far away from the flowpipe, the space for a barrier to exist is wide as well. Correspondingly, the barrier certificate found would usually locate far away from the flowpipe as well. Certainly, as U<sup>i</sup> gets closer to the flowpipe, the space for barrier certificates also contracts towards the flowpipe accordingly. Therefore, by expanding U<sup>i</sup> towards the flowpipe, we can get more precise over-approximations for the flowpipe.

**Why Multiple AUS?** Although the accuracy of the barrier certificate overapproximation can be improved by expanding the AUS towards the flowpipe, the capability of a single barrier certificate is very limited because it can erect a barrier which only matches a single profile of the flow pipe. However, if we have a set U of AUSs which are distributed evenly around the flowpipe and there is a barrier certificate B<sup>i</sup> for each U<sup>i</sup> ∈ U, these barrier certificates would be able to over-approximate the flowpipe from a number of profiles. Therefore, increasing the number of AUSs can increase the quality of the over-approximation as well. Furthermore, if all these auxiliary sets are connected, all the barriers would form a tube surrounding the flowpipe. Therefore, if we can create a series of boxes piecewise covering the flowpipe and then construct a barrier tube for every piece of the flowpipe, we obtain an over-approximation for the flowpipe by PBT.

Based on the above idea, we provide Algorithm 2 to compute barrier tube.




*Remark 4.* In Algorithm 2, for an n-dimensional flowpipe segment, we aim to build a barrier tube composed of 2(n − 1) barrier certificates, which means we need to construct 2(n − 1) AUSs. According to Algorithm 1, we know that the plane P is the only exit of the flowpipe from the enclosure E and G is roughly the region where they intersect. Let F <sup>G</sup> be the facet of E that contains G, then for every facet F <sup>G</sup> ij of <sup>F</sup> <sup>G</sup>, we can take an (<sup>n</sup> <sup>−</sup> 1)-dimensional rectangle between F <sup>G</sup> ij and <sup>G</sup>ij as an AUS, where <sup>G</sup>ij is the facet of <sup>G</sup> adjacent to <sup>F</sup>ij <sup>G</sup> . Therefore, enumerating all the facets of G in line 1 would produce 2(n − 1) positions for AUS. The loop in line 3 is attempting to find a polynomial barrier certificate of different degrees in D. In the while loop 5, we iteratively compute the best barrier certificate by adjusting the width of AUS through binary search until the difference in width between two successive AUSs is less than the specified threshold .

*Example 4 (Running example).* Consider the initial set and the enclosure computed in Example 3, we use Algorithm 2 to compute a barrier tube. The initial set is *X<sup>0</sup>* = [1.0, 1.0] × [−1.05, −0.95] and the enclosure of *X<sup>0</sup>* is E = [0.84, 1.01] × [−1.1, −0.75], G = [0.84, 0.84] × [−0.91, −0.80], the plane P is x = 0.84, D = {2} and = 0.001. The barrier tube consists of two barrier certificates. As shown in Fig. 3, each of the barrier certificates is derived from an AUS (red line segment) which is located respectively on the bottom-left and top-left boundary of E.

**Fig. 3.** Computing process of BT for Example 4. Blue line segment: initial set, red line segment: AUS. Figure a–l show how intermediate barrier certificates changed with the width of the AUSs and Fig. l shows the final BT (shadow region in green). (Color figure online)

#### **4.3 Compute Piecewise Barrier Tube**

During the computation of a barrier tube by Algorithm 2, we create a series of AUSs around the flowpipe, which build up a rectangular enclosure for the intersection of the flowpipe and the facet of the enclosure box. As a result, such a rectangular enclosure can be taken as an initial set for the following flowpipe segment and then Algorithm 2 can be applied repeatedly to compute a PBT. The basic procedure to compute PBT is presented in Algorithm 3.

*Remark 5.* In Algorithm 3, initially a box that contains the initial set X<sup>0</sup> is constructed using Algorithm 1. The loop in line 2 consists of 3 major parts: (1) In lines 3–6, a barrier tube BT is firstly computed using Algorithm 2. The **while** loop keeps shrinking the box until a barrier tube is found; (2) In line 8, the initial set *X<sup>0</sup>* is updated for the next box; (3) In line 9, a new box is constructed to contain *X<sup>0</sup>* and the process is repeated.

*Example 5 (Running example).* Let us consider again the running example. We set the length of PBT to 45 and the PBT we obtained is shown in Fig. 2b. Compared to the interval over-approximation of the Taylor model obtained using *Flow\**, the computed PBT consists of a significantly reduced number of segments and is more precise for the absence of stretching.

**Safety Verification Based on PBT.** The idea of safety verification based on PBT is straightforward. Given an unsafe set *X<sup>U</sup>* , for each intermediate initial set *X<sup>0</sup>* and the corresponding enclosure box E, we first check whether *X<sup>U</sup>* ∩ E = ∅. If not empty, we would further find a barrier certificate between *X<sup>U</sup>* and the flowpipe of *X<sup>0</sup>* inside E. If empty or barrier found, we continue to compute

#### **Algorithm 3.** Algorithm to compute PBT



**Table 1.** Model definitions

longer PBT. The refinement of PBT computation can be achieved by using smaller E and higher d for template polynomial.

#### **5 Implementation and Experiments**

We have implemented the proposed approach as a C++ prototype called Piecewise Barrier Tube Solver (*PBTS*), choosing *Gurobi* [12] as our internal linear programming solver. We have also performed some experiments on a benchmark of four nonlinear polynomial dynamical systems (described in Table 1) to compare the efficiency and the effectiveness of our approach w.r.t. other tools. Our experiments were performed on a desktop computer with a 3.6 GHz *Intel Core i7-7700* 8 Core CPU and 32 GB memory. The results are presented in Table 2.

*Remark 6.* There are a number of outstanding tools for flowpipe computation [1,3–5]. Since our approach is to perform flowpipe computation for polynomial

**Table 2.** Tool Comparison on Nonlinear Systems. #var: number of variables; T: computing time; NFS: number of flowpipe segments; DEG: candidate degrees for template polynomial (only for *PBTS*); TH: time horizon for flowpipe (only for *Flow\** and *CORA*). FAIL: failed to terminate under 30 min.


nonlinear systems, we pick two of the most relevant state-of-the-art tools for comparison: *CORA* [1] and *Flow\** [3]. Note that a big difference between our approach and the other two approaches is that *PBTS* is time-independent, which means that we cannot compare PBTS with *CORA* or *Flow\** over the exactly same time horizon. To be fair enough, for *Flow\** and *CORA*, we have used the same time horizon for the flowpipe computation, while we have computed a slightly longer flowpipe using *PBTS*. To guide the reader, we have also used different plotting colors to visualize the difference between the flowpipes obtained from the three different tools.

**Evaluation.** As pointed out in Sect. 1, a common problem with the boundedtime integration based approaches is that the flowpipe segment of a dynamics system can be extremely stretched with time so that the interval over-approximation of the flowpipe segment is very conservative and usually the solver has to stop prematurely due to the error explosion. This fact can be found easily from the figures Fig. 4, 5, 6 and 7. In particular, for *Controller 2D*, *Flow\** can give quite nice result in the beginning but started producing an exploding flowpipe very quickly (Note that *Flow\** offers options to produce better plotting which however is expensive and was not used for safety verification. *CORA* even failed to give a result after over 30 min of running). This phenomenon reappeared with both *Flow\** and *CORA* for *Controller 3D*. Notice that most of the time horizons used in the experiment are basically the time limits that *Flow\** and *CORA* can reach, i.e., a slightly larger value for the time horizon would cause the solvers to fail. In comparison, our tool has no such problem and can survive a much longer flowpipe before exploding or even without exploding as shown in Fig. 4a.

Another important factor of the approaches is the efficiency. As is shown in Table 2, our approach is more efficient on the first three examples but slower on the last example than the other two tools. The reason for this phenomenon is that the degree d of the template polynomial used in the last example is higher than the others and increasing d led to an increase in the number of decision variables in the linear constraint. This suggests that using smaller d on shorter flowpipe segment would be better. In addition, we can also see in Table 2 that the number of the flowpipe segments produced by *PBTS* is much fewer than that

**Fig. 4.** Flowpipe for Controller 2D.

**Fig. 5.** Flowpipe for Van der Pol Oscillator.

**Fig. 6.** Flowpipe for Lotka-Volterra.

**Fig. 7.** Flowpipe (projection) for Controller 3D.

produced by *Flow\** and *CORA*. In this respect, *PBTS* would be more efficient on safety verification.

### **6 Conclusion**

We have presented PBTS, a novel approach to over-approximate flowpipes of nonlinear systems with polynomial dynamics. The benefit of using BTs is that they are time-independent and hence cannot be stretched or deformed by time. Moreover, this approach only results in a small number of BTs which are sufficient to form a tight over-approximation for the flowpipe, hence the safety verification with PBT can be very efficient.

#### **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

### **Space-Time Interpolants**

Goran Frehse<sup>1</sup>, Mirco Giacobbe2(B), and Thomas A. Henzinger<sup>2</sup>

<sup>1</sup> Univ. Grenoble Alpes, CNRS, Grenoble INP, VERIMAG, Grenoble, France <sup>2</sup> IST Austria, Klosterneuburg, Austria mgiacobbe@ist.ac.at

**Abstract.** Reachability analysis is difficult for hybrid automata with affine differential equations, because the reach set needs to be approximated. Promising abstraction techniques usually employ interval methods or template polyhedra. Interval methods account for dense time and guarantee soundness, and there are interval-based tools that overapproximate affine flowpipes. But interval methods impose bounded and rigid shapes, which make refinement expensive and fixpoint detection difficult. Template polyhedra, on the other hand, can be adapted flexibly and can be unbounded, but sound template refinement for unbounded reachability analysis has been implemented only for systems with piecewise constant dynamics. We capitalize on the advantages of both techniques, combining interval arithmetic and template polyhedra, using the former to abstract time and the latter to abstract space. During a CEGAR loop, whenever a spurious error trajectory is found, we compute additional space constraints and split time intervals, and use these *space-time interpolants* to eliminate the counterexample. Space-time interpolation offers a lazy, flexible framework for increasing precision while guaranteeing soundness, both for error avoidance and fixpoint detection. To the best of out knowledge, this is the first abstraction refinement scheme for the reachability analysis over *unbounded* and *dense* time of affine hybrid systems, which is both *sound* and *automatic*. We demonstrate the effectiveness of our algorithm with several benchmark examples, which cannot be handled by other tools.

### **1 Introduction**

Formal verification techniques can be used to either provide rigorous guarantees about the behaviors of a critical system, or detect instances of violating behavior if such behaviors are possible. Formal verification has become widely used in the design of software and digital hardware, but has yet to show a similar success for physical and cyber-physical systems. One of the reasons for this is a scarcity of suitable algorithmic verification tools, such as model checkers, which are formally sound, precise, and scale reasonably well. In this paper, we propose a novel verification algorithm that meets these criteria for systems with piecewise affine dynamics. The performance of the approach is illustrated experimentally on a number of benchmarks. Since systems with affine dynamics have been studied before, we first describe why the available methods and tools do not handle this

c The Author(s) 2018

class of systems sufficiently well, and then describe our approach and its core contributions.

*Previous Approaches.* The algorithmic verification of systems with continuous or discrete-continuous (hybrid) dynamics is a hard problem both in theory and practice. For piecewise constant dynamics (PCD), the continuous successor states (a.k.a. flow pipe) can be computed exactly, and the complexity is exponential in the number of variables [17,19]. While in principle, any dynamics can be approximated arbitrarily well by PCD systems using an approach called hybridization [20], this requires partitioning of the state space, which often leads to prohibitive computational costs. For piecewise affine dynamics (PWA), one-step successors can be computed approximately using complex set representations. However, all published approaches suffer either from a possibly exponential increase in the complexity of the set representation, or from a possibly exponential increase in the approximation error as the considered time interval increases; this will be argued in detail in Sect. 4.

In addition to these theoretical obstacles, we note the following practical obstacles for the available tools and their performance in experiments. The only available model checkers that are (i) *sound* (i.e., they compute provable densetime overapproximations), (ii) *unbounded* (i.e., they overapproximate the flowpipe for an infinite time horizon), and (iii) *arbitrarily precise* (i.e., they support precision refinement) are, with one exception, limited to PCD systems, namely, HyTech [18], PHAVer [13], and Lyse [7]. The tool Ariadne [6] can deal with affine dynamics and is sound, unbounded, and precise. However, Ariadne discretizes the reachable state space with a rectangular grid. This invariably leads to an exponential complexity in terms of the number of variables. Other tools that are applicable to PWA systems do not meet our criteria in that they are either not formally sound (e.g., CORA [2], SpaceEx [15]), not arbitrarily precise because of templates or particular data structures (e.g., SpaceEx, Flow<sup>∗</sup> [8], CORA), or limited to bounded model checking (e.g., dReach [24], Flow∗). All the above tools exhibit fatal limitations in scalability or precision on standard PWA benchmarks; they typically work only on well-chosen examples. Note that while these tools do not meet the criteria we advance in this paper, they of course have strengths in other areas handling nonlinear and nondeterministic dynamics.

*Our Approach.* We view iterative abstraction refinement as critical for soundness and precision management, and fixpoint detection as critical for evaluating unbounded properties. We implement, for the first time, a CEGAR (counterexample-guided abstraction refinement) scheme in combination with a fixpoint detection criterion for PWA systems. Our abstraction refinement scheme manages complexity and precision trade-offs in a flexible way by decoupling time from space: the dense timeline is partitioned into a sequence of intervals that are refined individually and lazily, by splitting intervals, to achieve the necessary precision and detect fixpoints; state sets are overapproximated using template polyhedra that are also refined individually and lazily, by adding normal directions to templates; and both refinement processes are interleaved for optimal results, while maintaining soundness with each step. A similar approach was recently proposed for the limited class of PCA systems [7]; this paper can be seen as an extension of the approach to the class of piecewise affine dynamics.

With each iteration of the CEGAR loop, a spurious counterexample is removed by computing a proof of infeasibility in terms of a sequence of linear constraints in space and interval constraints in time, which we call a sequence of *space-time interpolants*. We use linear programming to construct a suitable sequence of space-time intervals and check for fixpoints. If a fixpoint check fails, we increase the time horizon by adding new intervals. The separation of time from space gives us the flexibility to explore different refinement strategies. Finetuning the iteration of space refinement (adding template directions), time refinement (splitting intervals), and fixpoint checking (adding intervals), we find that it is generally best to prefer fewer time intervals over fewer space constraints. Based on performance evaluation, we even expand individual intervals time when this is possible without sacrificing the necessary precision for removing a counterexample.

#### **2 Motivating Example**

The ordinary differential equation over the variables x and y

$$\begin{array}{l}\dot{x} = 0.1x - y + 1.8\\\dot{y} = x + 0.1y - 2.2\end{array} \tag{1}$$

moves counterclockwise around the point (2, 2) in an outward spiral. We center a box B (of side 0.92) on the same point and place a diagonal segment S close to the bottom right corner of B, without touching it (between (2, 1) and (3.5, 2); see Fig. 1). Then, we consider the problem of proving that every trajectory starting from any point in S never hits B. This is a time-unbounded reachability problem for a hybrid automaton with piecewise affine dynamics and two control modes. The first mode has the dynamics above (Eq. 1) and S as initial region. It has a transition to a second mode, which in its turn has B as invariant. The second mode is a bad mode, which all trajectories indeed avoid.

We tackle the reachability problem by abstraction refinement. In particular, we aim at automatically constructing an enclosure for the flowpipe—i.e., for the set of trajectories from S—which (i) avoids the bad state B and (ii) covers the continuous timeline up to infinity. Figure 1 shows three abstractions that result from different strategies for refining an initial space partition (i.e., template) and time partition (i.e., sequence of time intervals). All three refinement schemes start by enclosing S with an initial template polyhedron P, and then transforming P into a sequence of abstract flowpipe sections intflow[t,t] (P), one for each interval [t,t] of an initial partitioning of the unbounded timeline. The computation of new flowpipe sections stops when a fixpoint is reached,—i.e., we reach a time threshold t <sup>∗</sup> whose flowpipe section closes a cycle with intflow<sup>t</sup><sup>∗</sup> (P) <sup>⊆</sup> <sup>P</sup>, sufficient condition for any further flowpipe section to be contained within the union of previously computed sections.

**Fig. 1.** Comparison of abstraction refinement methods for the ODE in Eq. 1, the segment S as initial region, and the box B as bad region. The polyhedron P is the template polyhedron of S, and the gray polyhedra are the flowpipe sections intflow[*t,t*] (P).

Refinement scheme (a) sticks to a fixed octagonal template P—i.e., to the normals of a regular octagon—and iteratively halves all time intervals until every flowpipe section avoids the bad set B. This is achieved at interval width 1/64, but the computation does not terminate because no fixpoint is reached. Refinement scheme (b) splits time similarly but also computes a different, more accurate template for every iteration: first, an interval [t,t] is halved until it admits a halfspace interpolant —i.e., a halfspace <sup>H</sup> that <sup>S</sup> <sup>⊆</sup> <sup>H</sup> and intflow[t,t] (H) <sup>∩</sup> <sup>B</sup> <sup>=</sup> ∅; then, a maximal set of linearly independent directions is chosen as template from the normals of the obtained halfspaces. Refinement scheme (b) succeeds at interval width 1/16 to avoid B and reach a fixpoint; the latter at time 6.25, with intflow<sup>6</sup>.<sup>25</sup>(P) <sup>⊆</sup> <sup>P</sup>. Refinement scheme (c) modifies (b) by optimizing the refinement of the time partition: instead of halving time intervals, the maximal intervals which admit halfspace interpolants are chosen. This scheme produces a nonuniform time partitioning with an average interval width of about 1/8, discovers five template directions, and finds a fixpoint in fewer steps.

Each iteration of the abstraction refinement loop consists of first abstracting the initial region into a template polyhedron, second solving the differential equation into a sequence of interval matrices, and finally transforming the template polyhedron using each of the interval matrices. We represent each transformation symbolically, by means of its support function. Then, we verify (i) the separation between every support function and the bad region, and (ii) the containment of any support function in the initial template polyhedron. The separation problem amounts to solving one LP, and the inclusion problem amounts to solving an LP in each template direction. If the separation fails, then we independently bisect each time that does not admit halfspace interpolants and expand each that does, until all are proven separated. Together, these halfspace interpolants form an infeasibility proof for the counterexample: a space-time interpolant. We forward the resulting new time intervals and halfspaces to the abstraction generator, and repeat, using the refined partitioning and the augmented template. If the inclusion fails, then we increase the time horizon by some amount Δ, and repeat. Once we succeed with both separation and inclusion, the system is proved safe.

This example shows the advantage of lazily refining *both* the space partitioning (i.e., the template) by adding directions, and the time partitioning, by splitting intervals.

#### **3 Hybrid Automata with Piecewise Affine Dynamics**

A hybrid automaton with piecewise affine dynamics consists of an n-dimensional vector x of real-valued variables and a finite directed multigraph (V,E), the control graph. We call it the control graph, the vertices <sup>v</sup> <sup>∈</sup> <sup>V</sup> the control modes, and the edges <sup>e</sup> <sup>∈</sup> <sup>E</sup> the control switches. We decorate each mode <sup>v</sup> <sup>∈</sup> <sup>V</sup> with an initial condition <sup>Z</sup><sup>v</sup> <sup>⊆</sup> IR<sup>n</sup>, a nonnegative invariant condition <sup>I</sup><sup>v</sup> <sup>⊆</sup> IR<sup>n</sup> ≥0, and a flow condition given by the system of ordinary differential equations

$$
\dot{x} = A\_v x + b\_v. \tag{2}
$$

We decorate each switch <sup>e</sup> <sup>∈</sup> <sup>E</sup> with a guard condition <sup>G</sup><sup>e</sup> <sup>⊆</sup> IR<sup>n</sup> and an update condition given the difference equations x := Rex+s<sup>e</sup> . All constraints I, G, and Z are conjuctions of rational linear inequalities, A and R are constant matrices, and b and s constant vectors of rational coefficients. In this paper, whenever an indexing of modes and switches is clear from the context, we index the respective constraints and transformations similarly, e.g., we abbreviate A<sup>v</sup><sup>i</sup> with Ai.

A trajectory is a possibly infinite sequence of states (v, x) <sup>∈</sup> <sup>V</sup> <sup>×</sup> IR<sup>n</sup> repeatedly interleaved first by a switching time <sup>t</sup> <sup>∈</sup> IR≥<sup>0</sup> and then by a switch <sup>e</sup> <sup>∈</sup> <sup>E</sup>

$$t\_0(v\_0, x\_0)t\_0(v\_0, y\_0)e\_0(v\_1, x\_1)t\_1(v\_1, y\_1)e\_1\dots \tag{3}$$

for which there exists a sequence of solutions <sup>ψ</sup>0, ψ1,... : IR <sup>→</sup> IR<sup>n</sup> such that <sup>ψ</sup>i(0) = <sup>x</sup>i, <sup>ψ</sup>i(ti) = <sup>y</sup><sup>i</sup> and they satisfy (i) the invariant conditions <sup>ψ</sup>i(t) <sup>∈</sup> <sup>I</sup><sup>i</sup> and (ii) the flow conditions ψ˙ <sup>i</sup>(t) = <sup>A</sup>iψi(t) + <sup>b</sup>i, for all <sup>t</sup> <sup>∈</sup> [0, ti]. Moreover, <sup>x</sup><sup>0</sup> <sup>∈</sup> <sup>Z</sup>0, every switch <sup>e</sup><sup>i</sup> has source <sup>v</sup>i, destination <sup>v</sup>i+1, and the respective states satisfy (i) the guard condition <sup>y</sup><sup>i</sup> <sup>∈</sup> <sup>G</sup><sup>i</sup> and (ii) the update <sup>x</sup>i+1 <sup>=</sup> <sup>R</sup>iy<sup>i</sup> <sup>+</sup> <sup>s</sup>i. The maximal set of its trajectories is the semantics of the hybrid automaton, which is safe if none of them contains a special bad mode.

Every hybrid automaton with affine dynamics can be transformed into an equivalent hybrid automaton with linear dynamics, i.e., the special case where b = 0 on every mode. We obtain such transformation by adding one extra variable y, rewriting the flow of every mode into ˙x = Ax + by, and forcing y to be always equal to 1, i.e., invariant y = 1 and flow ˙y = 0 on every mode and update y = y on every switch. For this reason, in the following sections we discuss w.l.o.g. the reachability analysis of hybrid automata with linear dynamics.

#### **4 Time Abstraction Using Interval Arithmetic**

We abstract the reach set of the hybrid automaton with a union of convex polyhedra. In particular, we abstract the states that are reachable in a mode using a finite sequence of images of the initial region over a *time partitioning*, until a completeness threshold is reached. Thereafter, we compute the *template polyhedron* of each of the images that can take a switch. Then, we repeat in the destination mode and we continue until a fixpoint is found.

Precisely, a time partitioning T is a (possibly infinite) set of disjoint closed time intervals whose union is a single (possibly open) interval. For a finite set of directions <sup>D</sup> <sup>⊆</sup> IR<sup>n</sup>, the <sup>D</sup>-polyhedron of a closed convex set <sup>X</sup> is the tightest polyhedral enclosure whose facets normals are in D. In the following, we associate every mode <sup>v</sup> to a template <sup>D</sup><sup>v</sup> and a time partitioning <sup>T</sup><sup>v</sup> of the time axis IR≥<sup>0</sup>, we employ interval arithmetic for abstracting the continuous dynamics (Sect. 4.1), and on top of it we develop a procedure for hybrid dynamics (Sect. 4.2).

#### **4.1 Continuous Dynamics**

We consider w.l.o.g. a mode with ODE reduced to the linear form ˙x = Avx, invariant Iv, and a given time interval [t,t]. Every linear ODE ˙x = Ax has the unique solution

$$
\psi(t) = \exp(At)\psi(0). \tag{4}
$$

It follows (see also [16]) that the set of states reachable in v after exactly t time units from an initial region X is

$$\text{flow}\_v^t(X) \stackrel{\text{def}}{=} \exp(A\_v t)X \cap \bigcap\_{0 \le \tau \le t} \exp(A\_v(t-\tau))I\_v,\tag{5}$$

Then, the flowpipe section over the time interval [t,t] is

$$\text{flow}\_{v}^{[\underline{t},\overline{t}]}(X) \stackrel{\text{def}}{=} \cup \{ \text{flow}\_{v}^{t}(X) \mid t \in [\underline{t},\overline{t}] \}. \tag{6}$$

We note three straightforward but consequential properties of the reach set: (i) The accuracy of any convex abstraction depends on the size of the time interval: While flow<sup>t</sup> <sup>v</sup>(X) is convex for convex X, this is generally not the case for flow[t,t] <sup>v</sup> (X). (ii) We can prune the time interval whenever we detect that the reach set no longer overlaps with the invariant: If for any t <sup>∗</sup> <sup>≥</sup> 0, flow<sup>t</sup> ∗ <sup>v</sup> (X) = <sup>∅</sup>, then for all <sup>t</sup> <sup>≥</sup> <sup>t</sup> <sup>∗</sup>, flow<sup>t</sup> <sup>v</sup>(X) = <sup>∅</sup> and flow[t,t] <sup>v</sup> (X) = flow[t,t∗] <sup>v</sup> (X). (iii) We can prune the time interval whenever we detect containment in the initial states: If flow<sup>t</sup> ∗ <sup>v</sup> (X) <sup>⊆</sup> <sup>X</sup>, then flow[t,∞] <sup>v</sup> (X) = flow[t,t∗] <sup>v</sup> (X).

For given A and t, the matrix exp(At) can be computed with arbitrary, but only finite, accuracy. We resolve this problem by computing a rational interval matrix [M, <sup>M</sup>], which we denote intexp(A, t,t), such that for all <sup>t</sup> <sup>∈</sup> [t,t] we have element-wise that

$$\exp(At) \in \text{interp}(A, \underline{t}, \overline{t}). \tag{7}$$

This interval matrix can be derived efficiently with a variety of methods [25], e.g., using a guaranteed ODE solver or using interval arithmetic. The width of the interval matrix can be made arbitrarily small at the price of increasing the number of computations and the size of the representation of the rational numbers. In our approach, we do not rely in a fixed accuracy of the interval matrix. Instead, we require that the accuracy increases as the width of the time interval goes to zero. That way, we don't need to introduce an extra parameter. To ensure progress in our refinement loop, we require that the interval matrix decreases monotonically when we split the time interval. Formally, if [t,t] <sup>⊆</sup> [u, <sup>u</sup>] we require the element-wise inclusion intexp(A, t,t) <sup>⊆</sup> intexp(A, u, <sup>u</sup>). This can be ensured by intersecting the interval matrices with the original interval matrix after time splitting.

While the mapping with interval matrices is in general not convex [29], we can simplify the problem by assuming that all points of X are in the positive orthant. As long as X is bounded from below, this condition can be satisfied by inducing an appropriate coordinate change. Under the assumption that <sup>X</sup> <sup>⊆</sup> IR<sup>n</sup> ≥0,

$$[\underline{M}, \overline{M}](X) = \left\{ y \in \mathbb{R}^n \mid \underline{M}x \le y \le \overline{M}x \text{ and } x \in X \right\}.\tag{8}$$

Combining the above results, we obtain a convex abstraction of the flowpipe over a time interval as

$$\text{intflow}\_{v}^{[\underline{t},\overline{t}]}(X) \stackrel{\text{def}}{=} \text{inexp}(A, \underline{t}, \overline{t})X \cap I\_{v}. \tag{9}$$

The abstraction is conservative in the sense that flow[t,t] <sup>v</sup> (X) <sup>⊆</sup> intflow[t,t] <sup>v</sup> (X). On the other hand, the longer is the time interval, the coarser is the abstraction. For this reason, we construct an abstraction of the flowpipe in terms of a union of convex approximations over a time partitioning. The abstract flowpipe over the time partitioning T is

$$\text{intflow}\_{v}^{T}(X) \stackrel{\text{def}}{=} \cup \{ \text{intflow}\_{v}^{[\underline{t},\overline{t}]}(X) \mid [\underline{t},\overline{t}] \in T \}. \tag{10}$$

Again, this is conservative w.r.t. the concrete flowpipe, i.e., for all time partitionings T it holds that flow∪<sup>T</sup> <sup>v</sup> (X) <sup>⊆</sup> intflow<sup>T</sup> <sup>v</sup> (X). Moreover, it is conservative w.r.t. any refinement of <sup>T</sup>, i.e., the time partitioning <sup>U</sup> refines <sup>T</sup> if <sup>∪</sup><sup>U</sup> <sup>=</sup> <sup>∪</sup><sup>T</sup> and <sup>∀</sup>[u, <sup>u</sup>] <sup>∈</sup> <sup>U</sup> : <sup>∃</sup>[t,t] <sup>∈</sup> <sup>T</sup> : [u, <sup>u</sup>] <sup>⊆</sup> [t,t], then intflow<sup>U</sup> <sup>v</sup> (X) <sup>⊆</sup> intflow<sup>T</sup> <sup>v</sup> (X).

#### **4.2 Hybrid Dynamics**

We embed the flowpipe abstraction routine into a reachability algorithm that accounts for the switching induced by the hybrid automaton. The discrete post operator is the image of a set <sup>Y</sup> <sup>⊆</sup> IR<sup>n</sup> through a switch <sup>e</sup> <sup>∈</sup> <sup>E</sup>

$$\text{jump}\_e(Y) \stackrel{\text{def}}{=} R\_e(Y \cap G\_e) \oplus \{s\_e\}. \tag{11}$$

We explore the hybrid automaton constructing a set of abstract trajectories, namely sequences abstract states interleaved by time intervals and switches

$$(v\_0, X\_0)[\underline{t}\_0, \overline{t}\_0](v\_0, Y\_0)e\_0(v\_1, X\_1)[\underline{t}\_1, \overline{t}\_1](v\_1, Y\_1)e\_1\dots \tag{12}$$

**input** : Template {Dv} and partitioning {Tv} indexed by V **output**: Optionally an abstract trajectory (counterexample)

```
1 foreach v ∈ V with nonempty Zv do
2 push (v, Zv)[0, Δ] into the stack W;
3 add the Dv-polyhedron of Zv to Pv;
4 while W is not empty do
5 pop ... (v, X)[t, t] from W;
6 P ← Dv-polyhedron of X;
7 if v is bad and P ∩ Iv is nonempty then // check counterexample
8 return ... (v, X);
9 foreach t
             ∗ ∈ {t + δ, t + 2δ, . . . , t} do // find completeness threshold
10 if intflowt∗
                 v (P ) ⊆ Pv then break;
11 if t
        ∗ = t and intflowt
                      v(P ) 
⊆ Pv then // otherwise extend time horizon
12 push ... (v, X)[t, t + Δ] into W;
13 foreach [u, u] ∈ Tv and [u, u] ∩ [t, t∗] 
= ∅ do // construct flowpipe
14 Y ← intflow[u,u]
                   v (P );
15 foreach e ∈ E with source v and destination v do
16 X ← jumpe(Y );
17 if X ⊆ Pv then continue;
18 push ... (v, X)[u, u](v, Y )e(v
                                    , X
                                       )[0, Δ] into W;
19 add the Dv -polyhedron of X to Pv ;
```
**Algorithm 1.** Reachability procedure.

where <sup>X</sup>0, Y0, ··· ⊆ IR<sup>n</sup> are nonempty sets of states that comply with template {Dv} and partitioning {Tv} in the following sense. First, <sup>X</sup><sup>0</sup> <sup>=</sup> <sup>Z</sup><sup>0</sup> and <sup>X</sup>i+1 = jumpi(Yi) for all <sup>i</sup> <sup>≥</sup> 0. Second, <sup>Y</sup><sup>i</sup> = intflow[ti,ti] <sup>i</sup> (Pi) for all <sup>i</sup> <sup>≥</sup> 0, where <sup>P</sup><sup>i</sup> is the <sup>D</sup>i-polyhedron of <sup>X</sup><sup>i</sup> and [ti,ti] <sup>∈</sup> <sup>T</sup>i. The maximal set of abstract trajectories, the abstract semantics induced by {Dv} and {Tv}, overapproximates the concrete semantics in the sense that every concrete trajectory (see Eq. 3) has an abstract trajectory that subsumes it, i.e., modes and switches match, <sup>x</sup><sup>i</sup> <sup>∈</sup> <sup>X</sup>i, <sup>t</sup><sup>i</sup> <sup>∈</sup> [ti,ti], and <sup>y</sup><sup>i</sup> <sup>∈</sup> <sup>Y</sup>i, for all <sup>i</sup> <sup>≥</sup> 0.

Computing the abstraction involves several difficulties. First, the trajectories might be not finitary. Indeed, this is unsolvable in theory, because the reachability problem is undecidable [21]. Second, the post operators are hard to compute. In particular, obtaining the sets X and Y in terms of conjunctions of linear inequalities in IR<sup>n</sup> requires eliminating quantifiers. In Algorithm 1, we present a procedure (which does not necessarily terminate) for tackling the first problem. In the next section, we show how to tackle the second using support functions.

We employ Algorithm 1 to explore the tree of abstract trajectories. We store in the stack W the leaves to process ...(v,X), followed by a candidate interval [t,t]. For each leaf, we retrieve P, the template polyhedron of X. If it leads to a bad mode, we return, otherwise we search for a completeness threshold t ∗ between t excluded and t, checking for inclusion in the union of visited polyhedra Pv. In case of failure, we extend the time horizon of Δ and push the next candidate to the stack. Then, we partition the time between t and t <sup>∗</sup>, construct the flowpipe, and process switching. Upon each successful switch, we augment P<sup>v</sup> with the D<sup>v</sup> -polyhedron of the switching region X , avoiding to store redundant polyhedra. Notably, the latter operation is efficient because all polyhedra comply with the same template. For the same reason, we obtain efficient inclusion checks, which we implement by first computing the template polyhedron of the left hand side, and then comparing the constant terms of the respective linear inequalities.

In conclusion, this reachability procedure that takes a template {Dv} and a partitioning {Tv} and constructs a tree of reachable sets of states <sup>X</sup> and <sup>Y</sup> . It manipulates them through the post operators and overapproximate them into template polyhedra. In the next section, we discuss how to efficiently represent X and Y , so to efficiently compute their template polyhedra. In Sect. 6 we discuss how to discover appropriate {Dv} and {Tv}, so to eliminate spurious counterexamples.

#### **5 Space Abstraction Using Support Functions**

Abstracting away time left us with the task of representing the state space of the hybrid automaton, namely the space of its variable valuations. Such sets consists of polyhedra emerging from operations such as intersections, Minkowski sums, and linear maps with simple or interval matrices. In this section, we discuss how to represent precisely all sets emerging from any of these operations by means of their support functions (Sect. 5.1) and then how to abstract them into template polyhedra (Sect. 5.2). In the next section, we discuss how to refine the abstraction.

#### **5.1 Support Functions**

The support function of a closed convex set <sup>X</sup> <sup>⊆</sup> IR<sup>n</sup> in direction <sup>d</sup> <sup>∈</sup> IR<sup>n</sup> consists of the maximizer scalar product of d over X

$$\rho\_X(d) = \sup\{d^\top x \mid x \in X\},\tag{13}$$

and, indeed, uniquely represents any closed convex set [28]. Classic work on the verification of hybrid automata with affine dynamic have posed a framework for the construction of support functions from basic set operations, but under the assumption of unboundedness and nonemptiness of the represented set, and with approximated intersection [16]. Indeed, if the set is empty then its support function is −∞, while if it is unbounded an <sup>d</sup> points toward a direction of recession is +∞, making the framework end up into undefined values. Such conditions turn out to be limiting in our context, first because we find desirable to represent unbounded sets so to accelerate the convergence to a fixpoint of the abstraction procedure, but most importantly because when encoding support functions for long abstract trajectories we might be not aware whether its concretization is infeasible. Checking this is a crucial element of a counterexample-guided abstraction refinement routine.

Recent work on the verification of hybrid automata with constant dynamics, i.e., with flows defined by constraints on the derivative only, provides us with a generalization of the classic support function framework which relaxes away the assumptions of boundedness and nonemptiness and yields precise intersection [7]. The framework encodes combinations of convex sets of states into LP (linear programs) which enjoy strong duality with their support function. Similarly, we encode the support function in direction d of any set X into the LP

$$\begin{array}{ll}\text{minimize} & c^{\mathsf{T}}\lambda\\\text{subject to } A\lambda = Bd,\end{array} \tag{14}$$

over the nonnegative vector of variables λ. The LP is dual to ρX(d), which is to say that if the LP is infeasible then X is unbounded in direction d, and if the LP is unbounded then X is the empty set. Moreover, if the LP has bounded solution so does ρX(d) and the solutions coincide.

The construction is inductive on operations between sets. For the base case, we recall that from duality of linear programming the support function of a polyhedron given by a system of inequalities P x <sup>≤</sup> <sup>q</sup> is dual to the LP over <sup>λ</sup> <sup>≥</sup> <sup>0</sup>

$$\begin{array}{ll}\text{minimize} & q^{\mathsf{T}}\lambda\\\text{subject to} & P^{\mathsf{T}}\lambda = d.\end{array} \tag{15}$$

Then, inductively, we assume that for the set <sup>X</sup> <sup>⊆</sup> IR<sup>n</sup> we are given an LP with the coefficients <sup>A</sup>X, <sup>B</sup>X, and <sup>c</sup>X, and similarly for the set <sup>Y</sup> <sup>⊆</sup> IR<sup>n</sup>. For the support functions of <sup>X</sup> <sup>⊕</sup> <sup>Y</sup> , MX, and <sup>X</sup> <sup>∩</sup> <sup>Y</sup> we respectively construct the following LP over the nonnegative vectors of variables λ, μ, α, and β:

$$\begin{array}{ll}\text{minimize} & c\_X^\mathsf{T}\lambda + c\_Y^\mathsf{T}\mu\\ \text{subject to } & A\_X\lambda = B\_Xd \text{ and } A\_Y\mu = B\_Yd,\end{array} \tag{16}$$

$$\begin{array}{ll}\text{minimize} & c\_X^\mathsf{T} \lambda\\ \text{subject to } A\_X \lambda = B\_X M^T d \text{, and} \end{array} \tag{17}$$

$$\begin{array}{ll}\text{minimize} & c\_X^\top \lambda + c\_Y^\top \mu\\ \text{subject to } & A\_X \lambda - B\_X(\alpha - \beta) = 0 \text{ and} \\ & A\_Y \mu + B\_Y(\alpha - \beta) = B\_Y d. \end{array} \tag{18}$$

Such construction follows as a special case of [7], which we extend with the support function of a map through an interval matrix.

The time abstraction of Sect. 4 additionally requires us to represent the map of sets of states through interval matrices. Precisely, we are given convex set of nonnegative values <sup>X</sup> <sup>⊆</sup> IR<sup>n</sup> <sup>≥</sup><sup>0</sup>, the coefficients for the respective LP, an interval matrix [M, <sup>M</sup>] <sup>⊆</sup> IR<sup>n</sup>×<sup>n</sup>, and we aim at computing the support function of all values in X mapped by all matrices in [M, M]. To this end, we define the LP

$$\begin{array}{ll}\text{minimize} & c\_X^{\mathsf{T}} \lambda\\ \text{subject to} & A\_X \lambda + B\_X(\underline{M}^{\mathsf{T}} \mu - \overline{M}^{\mathsf{T}} \nu) = 0 \text{ and} \\ & -\mu + \nu = d,\end{array} \tag{19}$$

over the vectors λ, μ, and ν of nonnegative variables. This linear program corresponds to the the dual of the interval matrix map in Eq. 8.

#### **5.2 Computing Template Polyhedra**

We represent all space abstractions X and Y in our procedure by their support functions. In particular, whenever set operations are applied, instead of solving the operation by removing quantifiers, we construct an LP. We delay solving it until we need to compute a template polyhedron. In that case, we compute the D-polyhedron of the set X by computing its support function in each of the directions in <sup>D</sup>, and constructing the intersection of halfspaces ∩{dT<sup>x</sup> <sup>≤</sup> <sup>ρ</sup>X(d) <sup>|</sup> <sup>d</sup> <sup>∈</sup> <sup>D</sup>}.

#### **6 Abstraction Refinement Using Space-Time Interpolants**

The reachability analysis of hybrid automata by means of the combination of interval arithmetic and support functions presented in Sects. 4 and 5 builds an overapproximation of the system dynamics. It is always sound for safety, but it may produce spurious counterexamples, due to an inherent lack of precision of the time abstraction and the polyhedral approximation. The level of precision is given by two factors, namely the choice of time partitioning and the choice of template directions, excluding the parameters for approximation of the exponential function, which we assume constant (see Sect. 4.1). In the following, we present a procedure to extract infeasibility proofs from spurious counterexamples. We produce them in the form of time partitions and bounding polyhedra, which we call space-time interpolants. Space-time interpolants can then be used to properly refine time partitioning and template directions.

Consider the bounded path v0, e0, v1, e1,...,vk, ek, vk+1 over the control graph and a sequence of dwell time intervals [t0,t0], [t1,t1],..., [tk,tk] emerging from an abstract trajectory. We aim at extracting a sequence X0, X1,...,Xk+1 of (possibly nonconvex) polyhedra and a sequence T0, T1,...,T<sup>k</sup> of refinements of the respective dwell times such that <sup>Z</sup><sup>0</sup> <sup>⊆</sup> <sup>X</sup>0, jump<sup>0</sup> ◦ intflow<sup>T</sup><sup>0</sup> <sup>0</sup> (X0) <sup>⊆</sup> <sup>X</sup>1, . . . , jump<sup>k</sup> ◦ intflow<sup>T</sup><sup>k</sup> <sup>k</sup> (Xk) <sup>⊆</sup> <sup>X</sup>k+1, and <sup>X</sup>k+1 <sup>∩</sup> <sup>I</sup>k+1 is empty. In other words, we want every Xi+1 to contain all states that can enter mode vi+1 after dwelling on v<sup>i</sup> between t<sup>i</sup> and t<sup>i</sup> time, and the last to be separated from the invariant of mode vk+1. Containment is to hold inductively, namely Xi+1 has to contain what is reachable from Xi, and the time refinements T are to be chosen in such a way that containment holds in the abstraction. Then, we call the sequence X0, T0, X1, T1,...,Xk, Tk, Xk+1 a sequence of space-time interpolants for the path and the dwell times above.

We compute a sequence of space-time interpolants by alternating multiple strategies. First, for the given sequence of dwell times, we attempt to extract a sequence of halfspace interpolants using linear programming (Sect. 6.1). In case of failure, we iteratively partition the dwell times in sets of smaller intervals, separating nonswitching from switching times and until every combination of intervals along the path admits halfspace interpolants (Sect. 6.2). We accumulate all halfspaces to form a sequence of unions of convex polyhedra that, together with the obtained time partitionings, will form a valid sequence of space-time interpolants. Finally, we refine the abstraction using the time partitionings and the outwards pointing directions of all computed halfspaces, in order to eliminate the spurious counterexample (Sect. 6.3).

#### **6.1 Halfspace Interpolation**

Halfspace interpolants are the special case of space-time interpolants where every polyhedron in the sequence is defined by a single linear inequality [1]. Indeed, they are the simplest kind of space-time interpolants, and, for the same reason, the ones that best generalize the reachable states along the path. Unfortunately, not all paths admit halfspace interpolants, but, if one such sequence exists, then it can be extrapolated from the solution of a linear program.

Consider a path v0, e0,...,vk+1 with the respective dwell times [t0,t0],..., [tk,tk]. A sequence of halfspace interpolants consists of a sequence of sets H0,...,Hk+1 among either any halfspace, or the empty set, or the universe, such that <sup>Z</sup><sup>0</sup> <sup>⊆</sup> <sup>H</sup>0, jump<sup>0</sup> ◦ intflow[t0,t0] <sup>0</sup> (H0) <sup>⊆</sup> <sup>H</sup>1, . . . , jump<sup>k</sup> ◦ intflow[tk,tk] <sup>k</sup> (Hk) <sup>⊆</sup> <sup>H</sup>k+1, and <sup>H</sup>k+1∩Ik+1 is empty. In contrast with general space-time interpolants, every time partition consists of a single time interval and therefore the support function of every post operator jump ◦ intflow[t,t] can be encoded into a single LP (see Sect. 5). We exploit the encoding for extracting halfspace interpolants, similarly to a recent interpolation technique for PCD systems [7].

We encode the support function in direction d of the closure of the image of the post operators along the path, i.e., the set jump<sup>k</sup> ◦intflow[tk,tk] <sup>k</sup> ◦···◦jump<sup>0</sup> ◦ intflow[t0,t0] <sup>0</sup> (Z0), intersected with the invariant Ik+1. We obtain the following LP over the free vectors α0,...,αk+1 and the nonnegative vectors β, δ0,...,δk, γ0,...,γk+1, μ0,...,μk, and ν0,...,νk:

$$\begin{array}{ll}\text{minimize} & q\_{Z\_{0}}^{\sf T}\beta + \sum\_{i=0}^{k} (q\_{I\_{i}}^{\sf T}\gamma\_{i} + q\_{G\_{i}}^{\sf T}\delta\_{i} + s\_{i}^{\sf T}\alpha\_{i+1}) + q\_{I\_{k+1}}^{\sf T}\gamma\_{k+1} \\ \text{subject to } & P\_{Z\_{0}}^{\sf T}\beta &= \alpha\_{0}, \\ & \underline{M}\_{i}^{\sf T}\mu\_{i} - \overline{M}\_{i}^{\sf T}\nu\_{i} &= -\alpha\_{i} & \text{for each } i \in [0..k], \\ & -\mu\_{i} + \nu\_{i} + P\_{I\_{i}}^{\sf T}\gamma\_{i} + P\_{G\_{i}}^{\sf T}\delta\_{i} = R\_{i}^{\sf T}\alpha\_{i+1} & \text{for each } i \in [0..k], \\ & P\_{I\_{k+1}}^{\sf T}\gamma\_{k+1} &= -\alpha\_{k+1} + d, \\ \end{array} \tag{20}$$

where every system of inequalities P x <sup>≤</sup> <sup>q</sup> corresponds to the constraints of the respective init, guard, or invariant, every Rix + s<sup>i</sup> is an update equation, and every interval matrix [Mi, Mi] = intexp(Ai, ti,ti). In general, one can check whether the closure is contained in a halfspace <sup>a</sup>T<sup>x</sup> <sup>≤</sup> <sup>b</sup> by setting the direction to its linear term d = a and checking whether the objective function can equal its constant term b. In particular, we check for emptiness, which we pose as checking inclusion in 0<sup>x</sup> ≤ −1. Therefore, we set <sup>d</sup> = 0 and the objective function to equal <sup>−</sup>1. Upon affirmative answer, from the solution <sup>α</sup> 0, α 1,...,ν <sup>k</sup> we obtain a valid sequence of halfspace interpolants whose i-th linear term is given by α <sup>i</sup> and <sup>i</sup>-th constant term is given by q<sup>T</sup> <sup>Z</sup><sup>0</sup> <sup>β</sup> <sup>+</sup> <sup>i</sup>−<sup>1</sup> j=0(q<sup>T</sup> <sup>I</sup><sup>j</sup> <sup>γ</sup> <sup>j</sup> <sup>+</sup> <sup>q</sup><sup>T</sup> <sup>G</sup><sup>j</sup> <sup>δ</sup> <sup>j</sup> <sup>+</sup> <sup>s</sup><sup>T</sup> <sup>j</sup> <sup>α</sup> j+1).

```
input : sequence of intervals [u0, u0],..., [uj , uj ]
   output: set of intervals
1 b ← uj ;
2 while b < uj do
3 a ← b;
4 b ← b + ε;
5 c ← uj ;
6 if [u0, u0],..., [uj−1, uj−1], [a, b] does not admit halfspace interpolants then
7 continue;
8 if [u0, u0],..., [uj−1, uj−1], [a, c] admits halfspace interpolants then
9 push [a, c] to the output;
10 return;
11 while c − b>ε do
12 if [u0, u0],..., [uj−1, uj−1], [a, ε b+c
                                        2ε ] admits halfspace interpolants then
13 b ← ε b+c
                    2ε ;
14 else
15 c ← ε b+c
                    2ε ;
16 push [a, b] to the output;
```
**Algorithm 2.** Nonswitching time partitioning.

#### **6.2 Time Partitioning**

Halfspace interpolation attempts to compute a sequence of enclosures that are convex for a sequence of sets that are not necessarily convex. Specifically, it requires each halfspace to enclose the set of solutions of a linear differential equation, which is nonconvex, by enclosing its convex overapproximation along a whole time interval. As a result, large time intervals produce large overapproximations, on which halfspace interpolation might be impossible. Likewise, shorter intervals produce tighter overapproximations, which are more likely to admit halfspace interpolants. In this section, we exploit such observation to enable interpolation over large time intervals. In particular, we properly partition the time into smaller subintervals and we treat each of them as a halfspace interpolation problem. Later, we combine the results to refine the abstraction.

Time partitioning is a delicate task in the whole abstraction refinement loop. In fact, while template refinement affects linearly the performance of the abstractor, partitioning time intervals that can switch induces branching in the search, possibly leading to an exponential blowup. For this reason, we partition time by narrowing down the switching time, for incremental precision, until no more is left. In particular, we use Algorithm 2 to compute a set N of maximal intervals that admit halfspace interpolants, by enlarging or narrowing them of ε amounts. We embed this procedure in Algorithm 3 which, along the sequence, excludes the time in N, constructing a set of intervals S that overapproximate the switching time. In particular, we construct the set with the widest possible intervals that are disjoint from N. Algorithm 3 succeeds when no more intervals are left, otherwise we half ε and reapply it to the sequences that are left to process.

**input** : sequence of intervals [t0, t0],..., [tk, tk] **output**: set of sequences of intervals push [t0, t0] to the queue Q; **while** Q *is not empty* **do** pop [u0, u0],..., [u<sup>j</sup> , u<sup>j</sup> ] from Q; N ← nonswitching time partitioning of [u0, u0],..., [u<sup>j</sup> , u<sup>j</sup> ]; **foreach** [a, a] ∈ N **do** push [u0, <sup>u</sup>0],..., [uj−1, <sup>u</sup>j−1], [a, <sup>a</sup>] to the output; **if** j = k **then assert** [u<sup>j</sup> , u<sup>j</sup> ]\ ∪ N = ∅; **continue**; S ← choose set of intervals that cover [u<sup>j</sup> , u<sup>j</sup> ]\ ∪ N; **foreach** [b, b] ∈ S **do** push [u0, <sup>u</sup>0],..., [uj−1, <sup>u</sup>j−1], [b, <sup>b</sup>], [tj+1, <sup>t</sup>j+1] to <sup>Q</sup>;

**Algorithm 3.** Dwell time partitioning.

#### **6.3 Abstraction Refinement**

The procedures above construct sequences of time intervals [u0, u0],..., [u<sup>j</sup> , u<sup>j</sup> ] that are included in [t0,t0],..., [tk,tk] and that, with the respective halfspace interpolants, this constitutes a proof of infeasibility for the counterexample. Yet, it does not form a sequence of space-time interpolants X0, T0,...,Xk+1. We form each partitioning T<sup>i</sup> by splitting [ti,ti] in such a way each element of T<sup>i</sup> is either contained in [ui, ui] or disjoint from it, for all intervals [ui, ui]. Then, we refine the partitioning of mode v<sup>i</sup> similarly. Each polyhedron X<sup>i</sup> is a union of convex polyhedra, each of which is the intersection of all halfspaces H<sup>i</sup> corresponding to some sequence [u0, u0],..., [ui, ui]. Nevertheless, to refine the abstraction we do not need to construct Xi, but just to take the outward point directions of all H<sup>i</sup> and add them to the template of vi.

#### **7 Experimental Evaluation**

We implemented our method in C++ using GMP and Eigen for multiple precision linear algebra, Arb for interval arithmetic, and PPL for linear programming [5,23]. In particular, all libraries we are using are meant to provide guaranteed solutions, as well as our implementation. We evaluate it on several instances of a *filtered oscillator* and a *rod reactor*, which are both parametric in the number of variables, and the latter in the number of modes too [15,35]. We record several statistics from every execution of our tool: the number #cex of counterexamples found during the CEGAR loop, the number #dir of linearly independent directions and the average width of the time partitionings extracted from all space-time interpolants. Moreover, we independently measure three times. First, the time spent in finding counterexamples, namely the total time taken by inconclusive abstractions which returned a spurious counterexample. Second, the refinement time, that is the total time consumed by computing space-time interpolants. Finally, the verification time, that is the time spend in the last abstraction of the CEGAR loop, which terminates with a fixpoint proving the system safe. We compare the outcome and the performance of our tool against Ariadne which, to the best of our knowledge, is the only verification tool available that is numerically sound and time-unbounded [11].


**Table 1.** Statistics for the benchmark examples (oot when > 1000 s).

The filtered oscillator is hybrid automaton with four modes that smoothens a signal x into a signal z. It has k + 2 variables and a system of k + 2 affine ODE, where k is the order of the filter. Table 1 shows the results, for a scaling of k up to the 11-th order. The first observation is that the CEGAR loop behaves quite similarly on all scalings: number of counterexamples, number of directions, and time partitionings are almost identical. On the other hand, the computation times show a growth, particularly in the refinement phase which dominates over abstraction and verification. This suggests us that our procedure exploits efficiently the symmetries of the benchmark. In particular, time partitioning seems unaffected. What affects the performance is linear programming, whose size depends on the number of variables of the system.

The rod reactor consists of a heating reactor tank and k rods each of which cools the tank for some amount of time, excluding each other. The hybrid automaton has one variable x for the temperature, k clock variables, one heating mode, one error mode, and k cooling modes. If the temperature reaches a critical threshold and no rod can intervene, it goes into an error. For this benchmark, we start with a simple template, the interval around x, and we discover further directions. Table 1 highlights two fundamental differences with the previous benchmark. First, the average width grows with the model size. This is because the heating mode requires finer time partitioning than the cooling modes. The cooling modes increase with the number of rods, and so does the average width over all time partitions. Second, while with the filtered oscillator the difficulty laid at interpolation, for the rod reactor interpolation is rather easy as well as finding counterexamples. Most of the time is spent in the verification phase, where all fixpoint checks must be concluded, without being interrupted by a counterexample. This shows the advantage of our lazy approach, which first processes the counterexamples and finally proves the fixpoint.

Our method outperforms Ariadne on all benchmarks. On the other hand, tools like Flow\* and SpaceEx can be dramatically faster [9]. For instance, they analyze filtosc 8th ord in resp. 9.1 s and 0.36 s (time horizon of 4 and jump depth of 10). This is hardly surprising, as our method has primarily been designed to comply with soundness and time-unboundedness, and pays the price for that.

#### **8 Related Work**

There is a rich literature on CEGAR approaches for hybrid automata, either abstracting to a purely discrete system [3,10,27,33,34] or to a hybrid automaton with simpler dynamics [22,30]. Both categories exploit the principle that the verification step is easier to carry out in the abstract domain. The abstraction entails a considerable loss of precision that can only be counteracted by increasing the number of abstract states. This leads to a state explosion that severely limits the applicability of such approaches. In contrast, our approach allows us to increase the precision by adding template directions, which does not increase the number of abstract states. The only case where we incur additional abstract states is when partitioning the time domain. This is a direct consequence of the nonconvexity of flowpipes of affine systems, and therefore seems to be unavoidable when using convex sets in abstractions. In [26], the abstraction consists of removing selected ODE entirely. This reduces the complexity, but does not achieve any fine-tuning between accuracy and complexity. Template reachability has been shown to be very effective in both scaling up reachability tasks to more efficient successor computations [15,31,32] and achieving termination even over unbounded time horizons [12]. The drawback of templates is the lack of accuracy, which may lead to an approximation error that accumulates excessively. Efforts to dynamically refine templates have so far not scaled well for affine dynamics [14]. A single-step refinement was proposed in [4], but as was illustrated in [7], the refinement needs to be inductive in order to exclude counterexamples in a CEGAR scheme.

#### **9 Conclusion**

We have developed an abstraction refinement scheme that combines the efficiency and scalability of template reachability with just enough precision to exclude all detected paths to the bad states. At each iteration of the refinement loop, only one template direction is added per mode and time-step. This does not increase the number of abstract states. Additional abstract states are only introduced when required by the nonconvexity of flowpipes of affine systems, a problem that we consider unavoidable. In contrast, existing CEGAR approaches for hybrid automata tend to suffer from state explosion, since refining the abstraction immediately requires additional abstract states. As our experiments confirm, our approach results in templates over very low complexity and terminates with an unbounded proof of safety after a relatively small number of iterations. Further research is required to extend this work to nondeterministic and nonlinear dynamics.

**Acknowledgments.** We thank Luca Geretti for helping us setting up Ariadne. This research was supported in part by the Austrian Science Fund (FWF) under grants S11402-N23 (RiSE/SHiNE) and Z211-N23 (Wittgenstein Award), by the European Commission under grant 643921 (UnCoVerCPS).

#### **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

### **Monitoring Weak Consistency**

Michael Emmi1(B) and Constantin Enea<sup>2</sup>

<sup>1</sup> SRI International, New York, NY, USA michael.emmi@sri.com <sup>2</sup> IRIF, Univ. Paris Diderot and CRNS, Paris, France cenea@irif.fr

**Abstract.** High-performance implementations of distributed and multicore shared objects often guarantee only the weak consistency of their concurrent operations, foregoing the de-facto yet performance-restrictive consistency criterion of linearizability. While such weak consistency is often vital for achieving performance requirements, practical automation for checking weak-consistency is lacking. In principle, algorithmically checking the consistency of executions according to various weakconsistency criteria is hard: in addition to the enumeration of linearizations of an execution's operations, such criteria generally demand the enumeration of possible visibility relations among the linearized operations; a priori, both enumerations are exponential.

In this work we identify an optimization to weak-consistency checking: rather than enumerating every possible visibility relation, it suffices to consider only the *minimal* visibility relations which adhere to the various constraints of the given criterion, for a significant class of consistency criteria. We demonstrate the soundness of this optimization, and describe an associated minimal-visibility consistency checking algorithm. Empirically, we show that our algorithm significantly outperforms the baseline weak-consistency checking algorithm, which na¨ıvely enumerates all visibilities, and adds only modest overhead to the baseline linearizability checking algorithm, which does not enumerate visibilities.

**Keywords:** Linearizability · Consistency · Runtime verification

#### **1 Introduction**

Programming software applications that can deal with multiple clients at the same time, and possibly, with clients that connect at different sites in a network, relies on optimized concurrent or distributed objects which encapsulate lockfree shared memory access or message passing protocols into high-level abstract data types. Given the potentially-enormous amount of software that relies on

This work is supported in part by the European Research Council (ERC) under the European Union's Horizon 2020 research and innovation programme (grant agreement No 678177).

c The Author(s) 2018

H. Chockler and G. Weissenbacher (Eds.): CAV 2018, LNCS 10981, pp. 487–506, 2018. https://doi.org/10.1007/978-3-319-96145-3\_26

these objects, it is important to maintain precise specifications and ensure that implementations adhere to their specifications.

One of the standard correctness criteria used in this context is linearizability (or strong consistency) [22], which ensures that the results of concurrentlyexecuted invocations match the results of some serial execution of those same invocations. Ensuring such a criterion in a distributed context (when data is replicated at different sites in a network) is practically infeasible or even impossible [17,19]. Therefore, various weak consistency criteria have been proposed like eventual consistency [23,36], "session guarantees" like read-my-writes or monotonic-reads [35], causal consistency [25,28], etc.

An axiomatic framework for formalizing such criteria has been proposed by Burckhardt et al. [9,11]. Essentially, this extends the linearizability-based specification methodology with a dynamic *visibility* relation among operations, in addition to the standard dynamic *happens-before* and *linearization* relations. Permitting weaker visibility relations models outcomes in which an operation may not observe the effects of concurrent operations that are linearized before it.

In this work, we propose an online monitoring algorithm that checks whether an execution of a concurrent (or distributed) object satisfies a consistency model defined in this axiomatic framework. This algorithm constructs a linearization and visibility relation satisfying the axioms of the consistency model gradually as the execution extends with more operations. It is possible that the linearization and visibility constructed until some point in time are invalidated as more operations get executed, which requires the algorithm to backtrack and search for different candidates. This exponential blow-up is unavoidable since even the problem of checking linearizability is NP-hard in general [18].

The main difficulty in devising such an algorithm is coming up with efficient strategies for enumerating linearizations and visibility relations which minimize the number of candidates needed to be explored and the number of times the algorithm has to backtrack. We build on previous works that propose such strategies for enumerating linearizations [29,38] in the context of linearizability checking. Roughly, the linearizations are extended iteratively by appending operations which are minimal in the happens-before order (among non-linearized operations). The choice of the minimal operations to append varies from one approach to the other. Our work focuses on combining such strategies with an efficient enumeration of visibility relations which are compatible with a given linearization.

Rather than specializing our results to one single consistency model, we consider a general class of consistency models from Burckhardt et al.'s axiomatic framework [9,11] in which the visibility relation among operations is constrained to be contained in the linearization relation. That class includes, for instance, time-stamp based models employed in distributed object implementations, in which time stamps serve to resolve conflicts by effectively linearizing concurrent operations. We show that within this class of consistency models, it is *not* necessary to enumerate the set of all possible visibility relations (included in the linearization) in order to check consistency of an execution. More precisely, we develop an algorithm for enumerating visibility relations that traverses operations in linearization order and chooses for each operation o, a *minimal* set of operations visible to o that conforms to the consistency axioms (up to the linearization prefix that includes o). In general there may exist multiple such minimal sets of operations, and each of them must be explored. When the visibility relation cannot be extended, the algorithm needs to backtrack and choose different minimal visibility sets for previous operations. However, when all the minimal candidates have been explored, the algorithm can soundly report that the execution is not consistent, without resorting to the exploration of nonminimal visibility relations.

Besides demonstrating the soundness of minimal-visibility consistency checking, we also demonstrate its empirical impact by applying our algorithm to concurrent traces of Java concurrent data structures. We find that our algorithm consistently outperforms the baseline na¨ıve approach to enumerating visibilities, which considers also non-minimal visibility relations. Furthermore, we demonstrate that minimal-visibility checking adds only modest overhead (roughly 2×) to the baseline linearizability checking algorithm, which does not enumerate visibilities. This suggests that small sets of minimal visibilities typically suffice in practice, and that the additional exponential enumeration of visibilities, atop the exponential enumeration of linearizations, may be avoidable in practice. Our implementation and experiments are open source, and publicly available on GitHub.<sup>1</sup>

In summary, this work makes the following contributions:


To the best of our knowledge, our algorithm is the first completely automatic algorithm for checking weak-consistency of arbitrary abstract data type implementations which avoids the na¨ıve enumeration of all possible visibility relations.

The rest of this paper is organized as follows. Section 2 elaborates a formalization of Burckhardt et al.'s axiomatic consistency framework [9,11], and Sect. 3 develops a formal argument to the soundness of considering only minimal visibility relations. Section 4 describes our overall consistency checking algorithms, and Sect. 5 describes our implementation and empirical evaluation. Section 6 describes related work, and finally Sect. 7 concludes.

### **2 Weak Consistency**

We describe a formal model for concurrent (distributed) object implementations. Clients interact with an object by making *invocations* from a set I and receiving

<sup>1</sup> https://github.com/michael-emmi/violat/releases/tag/cav-2018-submission.

**Fig. 1.** A history *h* and an abstract execution containing *h*.

*returns* from a set R (parameters of invocations, if any, are part of the invocation name). An *operation* is an invocation i ∈ I paired with a return r ∈ R; we denote such an operation by i ⇒ r. We denote individual operations by o. The invocation, resp., the return, in an operation o is denoted by *inv*(o), resp., *ret*(o).

The interaction between a client and an object is represented by a *history po*, *hb* over a set of operations O which consists of

– a *program (order) po* which is a partial order on O, and

– a *happens-before (order) hb* which is a partial order on O.

The program order is enforced by the client, e.g., by invoking a set of operations within the same thread or process, while the happens-before order represents the order in which the operations finished, i.e., (o1, o2) ∈ *hb* iff operation o<sup>1</sup> finished before o<sup>2</sup> started. We assume that the program order is included in the happens-before order.

*Example 1.* Let us consider a key-value map ADT containing operations of the form put(key, value) ⇒ old, which insert key-value pairs and return previouslymapped values for the given keys, remove(key) ⇒ value, which remove key mappings and return previously-mapped values, contains(value) ⇒ true/false, which test whether values are currently mapped, and get(key) ⇒ value, which return currently-mapped values for the given keys. Figure 1(a) pictures a history h where edges denote the program order *po* and happens-before *hb*. Such a history can be obtained by a client with three threads each making two invocations (the invocations within the same thread are aligned vertically).

The axiomatic specifications of concurrent objects we consider are based on the following abstract representation of executions: an *abstract execution* over operations O is a tuple *po*, *hb*, *lin*, *vis* that consists of a history *po*, *hb* over O,


<sup>2</sup> The linearization is also called *arbitration* in previous works, e.g., [9].

Intuitively, the visibility relation represents the inter-thread communication, how effects of operations are visible to other threads, while the linearization order models the "conflict resolution policy", how the effects of concurrent operations are ordered when they become visible to other threads.

We say that an operation o<sup>1</sup> such that o1, o2 ∈ *vis* is *visible* to o2, and that o<sup>2</sup> *sees* o1. Also, the set of operations visible to o<sup>2</sup> is called the *visibility set* of o2. The extensions of *inv* and *ret* to partial orders on O are defined component-wise as usual.

*Example 2.* Figure 1(b) pictures an abstract execution containing the history in Fig. 1(a). The visibility relation is defined by the edges labeled *vis* together with their transitive closure. The linearization order is defined by the order in which operations are written (from top to bottom).

A consistency criterion for concurrent objects is defined by a set of axioms over the relations in an abstract execution. These axioms relate abstract executions to a sequential semantics of the operations, which is defined by a function *Spec* : <sup>I</sup><sup>∗</sup> <sup>×</sup> <sup>I</sup> <sup>→</sup> <sup>R</sup> that determines the return value of an invocation given the sequence of invocations previously executed on the object<sup>3</sup>.

*Example 3.* The sequential semantics of the key-value map ADT considered in Example 1 is defined as expected. For instance, the return value of put(key, value) after a sequence of invocations σ is the value null if σ contains no invocation put(key,...), or old if put(key, old) is the last invocation of the form put(key,...) in σ.

The *domain* dom(R) of a relation R is the set of elements x such that x, y ∈ R for some y; the *codomain* codom(R) is the set of elements y such that x, y ∈ R for some x. By an abuse of notation, if x is an individual element, x ∈ R denotes the fact that x ∈ dom(R) ∪ codom(R). The *(left) composition* R<sup>1</sup> ◦ R<sup>2</sup> of two binary relations R<sup>1</sup> and R<sup>2</sup> is the set of pairs x, z such that x, y ∈ R<sup>1</sup> and y, z ∈ R<sup>2</sup> for some y. We denote the identity binary relation {x, x : x ∈ X} on a set X by [X], and we write [x] to denote [{x}].

*Return-value consistency* [9], a variant of eventual consistency without liveness guarantees, states that the return r of every operation i ⇒ r can be obtained from a sequential execution of i that follows the invocations visible to o (in the linearization order). This constraint will be formalized as an axiom called Ret. The visibility relation can be chosen arbitrarily. Standard "session guarantees" can be described in the same framework by adding constraints on the visibility relation: for instance, *read my writes*, i.e., operations previously executed in the same thread remain visible, can be stated as vis ⊇ po and *monotonic reads*, i.e., the set of visible operations to some thread grows monotonically over time, can

<sup>3</sup> Previous works have considered more general, concurrent semantics for operations. We restrict ourselves to sequential semantics in order to simplify the exposition. Our results extend easily to the general case.

**Fig. 2.** The grammar of consistency axioms.

**Fig. 3.** Consistency axiom satisfaction for abstract executions. The satisfaction relation *|*= is implicitly parameterized by a sequential semantics *Spec* which we consider fixed.

be stated as vis ⊇ vis ◦ po. Then, a version of causal consistency [7,9], called *causal convergence*, is defined by the following set of axioms:

vis ⊇ vis ◦ vis vis ⊇ po lin ⊇ vis Ret

which state that the visibility relation is transitive, it includes program order, and it is included in the linearization order. Finally, *linearizability* is defined by the set of axioms lin ⊇ hb, vis = lin, and Ret.

To state our results in a general context that concerns multiple consistency criteria defined in the literature (including the ones mentioned above) and variations there of, we consider a language of *consistency axioms* φ defined by the grammar in Fig. 2. A *consistency model* Φ is a set {φ1, φ2,...} of consistency axioms.

In the following, we assume that every consistency model is stronger than return-value consistency, and also, that the linearization order is consistent with the visibility and happens-before relations. The assumptions concerning the linearization order correspond to the fact that for instance, concurrent operations are ordered using timestamps that correspond to real-time. Formally, we assume that every consistency model contains the axioms

$$\Phi\_0 = \{\mathsf{Ret}, \mathsf{lin} \supseteq \mathsf{vis}, \mathsf{lin} \supseteq \mathsf{hb}\}.$$

Figure 3 defines the precise semantics of consistency axioms on abstract executions: the *context* of an operation o according to a linearization *lin* and visibility *vis*, denoted *ctxt*(*lin*, *vis*, o) is the restriction ([O*o*] ◦ *lin* ◦ [O*o*]) of *lin* to the operations O*<sup>o</sup>* = dom(*vis* ◦ [o]) visible to o. For instance, for the abstract execution in Fig. 1(b), *ctxt*(*lin*, *vis*, contains(0) ⇒ false) is the sequence of operations put(1, 0) ⇒ null; get(1) ⇒ 0; put(1, 1) ⇒ 0.

We extend this semantics to consistency models as e |= Φ iff e |= φ for all φ ∈ Φ and to histories as:

$$\langle \not po, hb \rangle \mid = \Phi \text{ iff } \exists l in, vis. \ \langle po, hb, lin, vis \rangle \mid = \Phi$$

*Example 4.* The abstract execution in Fig. 1(b) satisfies causal convergence: the visibility relation is transitive, it includes program order, and it is consistent with the linearization order. Moreover, the axiom Ret is also satisfied. For instance, the invocation contains(0) returns exactly false when executed after put(1, 0); get(1); put(1, 1). Similarly, it returns true when executed after put(1, 0); get(1); put(0, 0).

#### **3 Minimal Visibility Extensions**

Checking whether a given history satisfies a consistency model is intractable in general. This essentially follows from the fact that checking linearizability is NP-hard in general [18]. While the main issue in checking linearizability is enumerating the exponentially many linearizations, checking weaker criteria like causal convergence requires also an enumeration of the exponentially many visibility relations (included in a given linearization). We prove in this section that it is enough to enumerate only *minimal* visibility relations (w.r.t. set inclusion), included in a given linearization, in order to conclude whether a given history and linearization satisfy a consistency model.

A *linearized history* σ = *po*, *hb*, *lin* consists of a history and a linearization *lin* such that *lin* ⊇ *hb*. The extension of |= to linearized histories is defined as:

$$<\langle po, hb, lin \rangle \vdash \Phi \quad \text{iff} \quad \exists vis. \ \langle po, hb, lin, vis \rangle \vdash \Phi$$

The i-th element of a sequence s is denoted by s[i] and the prefix of s of length i is denoted by s*i*. The projection of a linearized history σ = *po*, *hb*, *lin* to a prefix *lin<sup>i</sup>* of *lin* is denoted by σ*i*. Formally, O*<sup>i</sup>* = dom(*lini*) ∪ codom(*lini*) and σ*<sup>i</sup>* = *po* ∩ (O*<sup>i</sup>* × O*i*), *hb* ∩ (O*<sup>i</sup>* × O*i*), *lini*.

For a linearized history *po*, *hb*, *lin* and a consistency model Φ, a visibility relation *vis<sup>i</sup>* on operations from a prefix *lin<sup>i</sup>* of *lin* is called Φ*-extensible* when there exists a visibility relation *vis* ⊇ *vis<sup>i</sup>* such that *po*, *hb*, *lin*, *vis* |= Φ. The relation *vis* is called a Φ*-extension of vis<sup>i</sup> up to lin*. By extrapolation, a Φextension of *vis<sup>i</sup> up to lin<sup>j</sup>* is a visibility relation *vis<sup>j</sup>* such that σ*<sup>j</sup>* , *vis <sup>j</sup>* |= Φ, for any i<j. Such an extension is called *minimal* when for every other Φextension *vis <sup>j</sup>* of *vis<sup>i</sup>* up to *lin<sup>j</sup>* , we have that *vis <sup>j</sup>* ⊆ *vis<sup>j</sup>* .

*Example 5.* Consider again the abstract execution in Fig. 1(b). Ignoring the edges labeled by *vis*, it becomes a linearized history σ. The prefix σ<sup>2</sup> contains just the two operations put(1, 0) ⇒ null and get(1) ⇒ 0. For causal convergence, the visibility relation *vis*<sup>2</sup> = {put(1, 0) ⇒ null, get(1) ⇒ 0} on operations of σ<sup>2</sup> is extensible, as witnessed by the visibility relation defined for the rest of the operations in this execution. The visibility relation

$$\begin{aligned} vis\_3 &= \{ \langle \mathtt{put}(1,0) \Rightarrow \mathtt{nu11}, \mathtt{get}(1) \Rightarrow 0 \rangle, \langle \mathtt{put}(1,0) \Rightarrow \mathtt{nu11}, \mathtt{put}(0,0) \Rightarrow \mathtt{nu11} \}, \\ \langle \mathtt{get}(1) \Rightarrow 0, \mathtt{put}(0,0) \Rightarrow \mathtt{nu11} \rangle \} \end{aligned}$$

is an extension of *vis*<sup>2</sup> up to *lin*3, and contains the operations in σ<sup>2</sup> together with put(0, 0) ⇒ null. Note that this extension is *not* minimal. A minimal extension would be exactly equal to *vis*<sup>2</sup> since, intuitively, put(0, 0) ⇒ null is not required to observe operations on keys other than 0.

The next lemma shows that minimizing the visibility sets of operations in a linearization prefix, while preserving the truth of the axioms on that prefix, doesn't exclude visibility choices for future operations (occurring beyond that prefix). In more precise terms, the Φ-extensibility status is not affected by choosing smaller visibility sets for operations in a linearization prefix. For instance, since the visibility *vis*<sup>3</sup> in Example 5 is extensible (for causal convergence), the smaller visibility relation in which put(0, 0) ⇒ null doesn't see any operation, is also extensible. This result relies on the specific form of the axioms, which ensure that smaller visibility sets impose fewer constraints on the visibility sets of future operations. For instance, the axiom *vis* ⊇ *vis* ◦ *vis* enforces that *vis* contains {o, o2 : o, o1 ∈ *vis*} whenever a pair o1, o2 is added to *vis*. Minimizing the visibility set of o<sup>1</sup> will minimize the set of operations that *must* be seen by o2, thus making the choice of the operations visible to o<sup>2</sup> more liberal.

**Lemma 1.** *For every linearized history* σ *and consistency model* Φ*, if*

σ*i*, *visi* |= Φ*, vis<sup>i</sup> is* Φ*-extensible,* σ*i*, *vis <sup>i</sup>* |= Φ*, and vis <sup>i</sup>* ⊆ *visi,*

*then vis <sup>i</sup> is* Φ*-extensible.*

*Proof (Sketch).* We show that the Φ-extension *vis* of *vis<sup>i</sup>* up to *lin* can be transformed to a Φ-extension of *vis <sup>i</sup>* up to *lin* by simply removing the pairs of operations in *vis<sup>i</sup>* \ *vis <sup>i</sup>*. Let *vis* be this visibility relation and Φ a consistency model. We prove that *po*, *hb*, *lin*, *vis* |= Φ by considering the different types of axioms defined in Fig. 2.

Suppose that Φ contains an axiom of the form vis ⊇ *rel* (according to the notations in Fig. 2). We have that *vis <sup>i</sup>* ⊇ (*rel*[*po*/po][*hb*/hb][*lin*/lin][*vis* /vis])◦[O*i*] by the hypothesis (from (σ*i*, *vis <sup>i</sup>*) |= Φ). Then, *vis <sup>i</sup>* ⊆ *vis<sup>i</sup>* implies that

$$\begin{aligned} & (rel[po/p\mathbf{o}][hb/\hbar\mathbf{b}][lin/\text{lin}][vis/\text{vis}]) \circ [O \backslash O\_i] \\ & \supseteq (rel[po/p\mathbf{o}][hb/\hbar\mathbf{b}][lin/\text{lin}][vis'/\text{vis}]) \circ [O \backslash O\_i] \end{aligned}$$

which together with *vis* ◦ [O \ O*i*] = *vis* ◦ [O \ O*i*] (the visibility relations *vis* and *vis* are the same for operations which are not included in the prefix *lini*) implies that

$$
\vdash vis' \circ [O \backslash O\_i] \supseteq (rel[po/p\mathbf{o}][hb/\hbar\mathbf{b}][lin/\text{lin}][vis'/\text{vis}]) \circ [O \backslash O\_i].
$$

Therefore, *po*, *hb*, *lin*, *vis* |= vis ⊇ *rel*.

The axiom Ret relates the return value of each operation o in σ to the set of operations visible to o. This relation is insensitive to the set of operations seen by an operation before o in the linearization order. Therefore, *po*, *hb*, *lin*, *vis* |= Ret is an immediate consequence of (σ*i*, *vis <sup>i</sup>*) |= Ret and the fact that *vis* and *vis* are the same for operations which are not included in the prefix *lini*.

The axioms of the form lin ⊇ *rel* (according to the notations in Fig. 2) are straightforward implications of lin ⊇ hb and lin ⊇ vis, which are assumed to be included in any consistency model. They hold for any linearized history.

The main result of this section shows that a visibility enumeration strategy that considers operations in the linearization order and computes minimal extensions iteratively, possibly backtracking to another choice of minimal extension if necessary, is complete in general (it finds a visibility relation satisfying the consistency axioms Φ iff the input linearized history satisfies Φ). Backtracking is necessary since in general, there may exist multiple minimal extensions and all of them should be explored. For a given linearized history σ and visibility relation *vis* on operations of σ, *vis<sup>i</sup>* = *vis* ◦ [O*i*] denotes the restriction of *vis* to operations from the prefix *lini*.

**Theorem 1.** *For every linearized history* σ *and consistency model* Φ*,* σ |= Φ *iff there exists a visibility relation vis such that*

*for every* i*, visi*+1 *is a minimal* Φ*-extension of vis<sup>i</sup> up to lini*+1*.*

*Proof.* (Sketch) Let σ be a linearized history such that σ |= Φ. Therefore, there exists a visibility relation *vis* such that σ, vis |= Φ. We prove by induction that there exists a visibility relation *vis* satisfying the claim of the theorem. Assume that there exists a Φ-extensible visibility relation *vis<sup>j</sup>* on operations in *lin<sup>j</sup>* which satisfies the claim of the theorem for every i<j (we take *vis*<sup>0</sup> = *vis*). Let *vis<sup>j</sup>*+1 be a minimal visibility relation on operations in *linj*+1 such that *vis<sup>j</sup>*+1 ◦ [O*<sup>j</sup>* ] = *vis <sup>j</sup>* ◦ [O*<sup>j</sup>* ] and (σ*j*+1, vis*<sup>j</sup>*+1) <sup>|</sup><sup>=</sup> <sup>Φ</sup> (such a set exists because *vis <sup>j</sup>* is Φ-extensible). By Lemma 1, *vis<sup>j</sup>*+1 is Φ-extensible. Also, *vis<sup>j</sup>*+1 satisfies the claim of the theorem for every i<j + 1. The reverse direction is trivial. -

*Example 6.* In the context of the abstract execution in Fig. 1(b), the visibility relation defined by removing the *vis* edge ending in put(0, 0) ⇒ null, and adding the transitive closure, satisfies the requirements in Theorem 1.

#### **4 Efficient Monitoring of Consistency Models**

We describe an algorithm for checking whether a given history satisfies a consistency model, which combines linearization enumeration strategies proposed in [29,38] with the visibility enumeration strategy proposed in Sect. 3.

The algorithm is defined by the procedure checkConsistency listed in Fig. 4. This recursive procedure searches for extensions of the input linearization and visibility (initially, checkConsistency will be called with *lin* = *vis* = ∅) which witness that the input history h satisfies Φ. It assumes that the inputs *lin* and *vis* satisfy the axioms of the consistency model Φ when the input history is projected on the linearized operations (the operations in *lin*). This projection is denoted by h*lin*. Formally, the precondition of this procedure is that h*lin*, *lin*, *vis* |= Φ.

The extensions of *lin* and *vis* are built in successive steps. At each step, the linearization is extended according to the procedure linExtensions and the visibility according to the procedure visExtensions.

The abstract implementation of linExtensions, presented in Fig. 4, chooses a set of *non-linearized* operations O which are *minimal* among non-linearized

**Fig. 4.** Checking consistency of a history. The procedures linExtensions, resp., visExtensions return the set of linearizations, resp., visibilities, produced by the instruction yield.

operations w.r.t. happens-before, i.e., returned by minimals(h, *lin*), and appends any linearization of the operations in O to the input linearization *lin*. Formally, O ⊆ {o : o ∈ *lin* and ∀o . o ∈ *lin* ⇒ ¬o ≺ o}, where ≺ denotes the happensbefore relation. The fact that the operations in O are minimal among nonlinearized operations ensures that the returned linearizations are consistent with the happens-before order.

Two linearization enumeration strategies proposed in the literature can be seen as instances of linExtensions. The strategy in [38] corresponds to the case where O contains exactly one minimal operation. For instance, for the history in Fig. 1(a), this strategy will start by picking a minimal element in the happensbefore relation, say put(1, 0) ⇒ null, then, a minimal operation among the rest, say get(1) ⇒ 0, and so on.

The strategy proposed in [29] is slightly more involved (and according to experimental results, more efficient), but it relies on a presentation of histories h as sequences of call and return actions (an operation spanning the time interval between its call and return action). The happens-before order is extracted as usual: an operation o<sup>1</sup> happens before an operation o<sup>2</sup> if its return occurs before the call of o2. This strategy defines O as the first non-linearized operation o that returned in h together with a set of non-linearized operations O that are concurrent with o (i.e., are not ordered after o in the happens-before order). The operation o is linearized last in the returned extensions. For instance, consider the history h in Fig. 5 represented as a sequence of call/return actions (small boxes at the begin, resp., end, of an interval denote call actions, resp., return actions). The first linearization extension (when *lin* = ∅) includes put(1, 0) ⇒ null (the first operation to return) after some sequence of operations concurrent with it, for

**Fig. 5.** The history *h* in Fig. 1 presented as a sequence of call/return actions.

instance the empty sequence. Next, the current linearization put(1, 0) ⇒ null can be extended by adding put(0, 0) ⇒ null (the first operation to return, if we exclude put(1, 0) ⇒ null which is already linearized) and possibly get(1) ⇒ 0 before it. Suppose that we choose put(1, 0) ⇒ null; get(1) ⇒ 0; put(0, 0) ⇒ null. Then, the extension will include put(1, 1) ⇒ 0 and possibly contains(0) ⇒ true or contains(0) ⇒ false, and so on. Compared to the previous strategy, an extension step can add multiple operations.

The extensions of the visibility relation (returned by visExtensions) are minimal Φ-extensions of *vis* up to the input linearization. They can be constructed iteratively by considering the newly linearized operations one by one and each time compute a minimal extension of the visibility. For instance, the linearization construction explained in the previous paragraph can be expanded with a visibility enumeration as follows:


The procedure checkConsistency backtracks to a different extension when the current one cannot be completed to include all the operations in the input history (checked by the recursive call). The correctness of the algorithm is stated in the following theorem.

**Theorem 2.** checkConsistency(h, Φ, ∅, ∅) *returns true iff* h |= Φ*.*

#### **5 Empirical Results**

While our minimal-visibility consistency checking algorithm is applicable to a wide class of distributed and multicore shared object implementations, here we demonstrate its efficacy on histories recorded from executions of Java Development Kit (JDK) Standard Edition concurrent data structures. Recent work demonstrates that JDK concurrent data structures regularly admit non-atomic behaviors, often by design [14]; these weakly-consistent behaviors span many methods of the java.util.concurrent package, including the ConcurrentHashMap, ConcurrentSkipListMap, ConcurrentSkipListSet, ConcurrentLinkedQueue, and the ConcurrentLinkedDeque, for instance, including the contains method described in Example 3.

We extracted 4,000 randomly-sampled histories from approximately 8,000 observed over approximately 1,000,000 executions in stress testing 20 randomlygenerated client programs of the ConcurrentSkipListMap with up to 15 invocations across up to 3 threads. In each program, the given number of threads invokes its share of randomly-generated methods with randomly-generated values. We consider random generation superior to collecting programs *in the wild*, since found client programs can mask inconsistencies by restricting method argument values, or by being agnostic to inconsistent return values. Furthermore, automated generation gives us the ability to evaluate our algorithm on unbiased sample sets, and avoid any technical problems in the collection of programs; it also allows us to test method combinations which might not appear in publiclyavailable examples.

We subject each client program to 1 s of stress testing<sup>4</sup> to record histories. The return value of each invocation is stored in a different thread-local variable which is read at the end of the execution. Recording the happens-before order between invocations without affecting implementation behavior significantly (e.g., without influencing the memory orderings between shared-memory accesses) is challenging. For instance, we found the use of high-precision timers to be unsuitable, since the response-time of System.nanoTime calls is much higher than calls to the implementations under test; invoking such timers between each invocation of implementation methods would prevent implementation methods from overlapping in time, and thus hide any possible inconsistent behaviors. Similarly, the use of atomic operations and volatile variables would impose additional synchronization constraints and prevent many weak-memory reorderings.

Essentially, our solution is to introduce a shared variable per thread storing its program counter – in our context, the program counter stores the number of call and return events thus far executed. A thread's program counter is read by every other thread before and after each invocation. Figure 6 demonstrates a simplified version<sup>5</sup> of our encoding for a program with two threads each invoking two methods. The program counter variables pc0 and pc1 are not declared volatile, which, in principle, provides stronger guarantees concerning the derived happens-before relation; such declarations would interfere with implementation weak-memory effects. The program counter values read by each thread allows

<sup>4</sup> For stress testing we leverage OpenJDK's JCStress tool: http://openjdk.java.net/ projects/code-tools/jcstress/.

<sup>5</sup> In our actual implementation, each program-counter access is encapsulated within a method call in order to avoid compiler reordering between the reads of other threads' counters and the increment of one's own. While the Java memory model does not guarantee that such encapsulation will prevent reordering, we found this solution to be adequate on Oracle's Java SE runtime version 9. Our actual implementation also wraps invocations in try-catch blocks to deal with exceptions.

**Fig. 6.** Our encoding for recording ConcurrentHashMap histories. Each thread's program counter is read before and after other threads' invocations, and incremented subsequent to each such read. The two-dimensional pcs[*n*][*m*] array stores *n* program counter values for *m* neighboring threads.

us to extract a happens-before order between invocations which is *sound* in the sense that the actual happens-before may order more operations, but not fewer – assuming that shared-memory accesses satisfy at least the total-store order (TSO) semantics in which writes are guaranteed to be performed according to program order. For instance, when pcs[0][0] > 2 in the second thread (thread1), the first invocation in the other thread (thread0) happens-before the first invocation in this thread. Otherwise, if pcs[0][0] < 2, then the two invocations are overlapping in time. The latter may not be true in the real happens-before due to the delay in incrementing and reading the program counter variables. Although some loss of precision is possible, we are unaware of other methods for tracking happens-before which avoid significant interference with the implementation under test.

Based on the encoding described above, we generate histories as sequences of call and return actions which serve as input to our consistency checking algorithms. For simplicity, we have considered just two consistency models, linearizability and a weak consistency model defined by {Ret, lin ⊇ vis, lin ⊇ hb, vis ⊇ hb} – see Sect. 2. We consider linearizability in order to measure the overhead of checking weak consistency due to visibility enumeration; the second model is simply the easiest weak-consistency model to support with our implementation; the choice among possible weak-consistency models appears fairly arbitrary, since the enumeration of visibility relations is common to all.

We consider several measurements, the results of which are listed in Figs. 7 and 8; all times are measured in milliseconds on logarithmic scale on a 2.7 GHz Intel Core i5 MacBook Pro with Oracle–s Java SE runtime version 9; and

**Fig. 7.** Empirical comparison of (left) standard linearizability checking versus just-intime linearizability checking on concurrent traces of Java data structures; and (right) weak-consistency checking versus standard linearizability checking. Each point reflects the time in milliseconds for checking a given trace.

timeouts are set to 1000 ms. We note that while accurate and *recording* of operation timings within an execution without interference is challenging, timing the *validation* of each recorded history, which we report here, is accomplished accurately, without interference, by computing the clock difference just before and after validation.

Our first measurements establish the baseline linearizability and weakconsistency checking algorithms. On the left side of Fig. 7 we consider the time required to check linearizability for each history by our own implementations of Wing and Gong's standard enumerative approach [38], along with Lowe's "just-in-time linearizability" algorithm [29] – see Sect. 4. We resolve the nondeterminism in these algorithms (e.g., in choosing which pending operation to attempt linearizing first) arbitrarily (e.g., first called), finding no clear winner: each algorithm performs better on some histories. Since these subtleties are outside the scope of our work, we avoid further investigation and choose Wing and Gong's algorithm as our baseline linearizability-checking algorithm.

Our second measurement exposes the overhead of enumerating visibility relations for checking weak consistency. On the right side of Fig. 7 we consider the time required to check weak consistency of a given history versus the time required to check its linearizability.<sup>6</sup> We observe an overhead of approximately 10× due to visibility enumeration and validation. Our na¨ıve implementation enumerates candidate visibilities in size-decreasing order since we expect visibility-loss to be the exception rather than the rule; for instance, atomic operations observe all linearized-before operations. We omit the analogous comparison between weak-consistency checking and just-in-time linearizability checking to avoid redundancy, since the just-in-time optimization is a seemingly-insignificant factor in our experiments: the results are nearly identical.

<sup>6</sup> Due to a benign error in the decoding of results of stress testing, we observe one single point on which the two algorithms conflict – labeled by "Unknown.".

**Fig. 8.** Empirical comparison of (left) standard weak-consistency checking versus minimal-visibility weak-consistency checking on concurrent traces of Java data structures; and (right) the latter versus standard linearizability checking. Each point reflects the time in milliseconds for checking a given trace.

Our third measurement demonstrates the impact of our minimal-visibility consistency checking optimization. On the left side of Fig. 8 we consider the time required to check weak consistency without and with our optimization. The difference is dramatic, with our optimized algorithm consistently outperforming, sometimes up to multiple orders of magnitude: the leftmost 1000 ms timeout of the na¨ıve algorithm is matched by a roughly 18 ms positive identification. Finally, our fourth measurement, on the right side of Fig. 8, demonstrates that the overhead of our minimal-visibility checking algorithm over linearizability checking is quite modest: we observe roughly a 2× overhead, compared with the observed 10× overhead without optimization.

While our experiments clearly demonstrate the efficacy of our minimalvisibility consistency checking algorithm, we will continue to evaluate this optimization across a wide range of concurrent objects, consistency models, and client programs, e.g., including many more concurrent threads. While we do expect the performance of linearizability- and weak-consistency checking to vary with thread count, we expect the performance gains of minimal-visibility consistency checking to continue to hold.

#### **6 Related Work**

Herlihy and Wing [22] described linearizability, which is the standard consistency criterion for shared-memory concurrent objects. Motivated by replication-based distributed systems, Burckhardt et al. [9,11] describe a more general axiomatic framework for specifying weaker consistencies like eventual consistency [36] and causal consistency [2]. Our weak consistency checking algorithm applies to consistency models described in this framework.

While several static techniques have been developed to prove linearizability [1,4,6,12,13,21,22,24,26,27,30–34,37,39], few have addressed dynamic techniques such as testing and runtime verification. The works in [29,38] describe monitors for checking linearizability that construct linearizations of a given history incrementally, in an online fashion. Line-Up [10] performs systematic concurrency testing via schedule enumeration, and offline linearizability checking via linearization enumeration. Our weak consistency checking algorithm combines these approaches with an efficient enumeration of visibility relations. The works in [15,16] propose a symbolic enumeration of linearizations based on a SAT solver. Although more efficient in practice, this approach applies only to certain ADTs. In this work, we propose a generic approach that assumes no constraints on the sequential semantics of the concurrent objects.

Bouajjani et al. [7] consider the problem of verifying causal consistency. They propose an algorithm for checking whether a given execution satisfies causal consistency, but only for the key-value map ADT with simple put and get operations. Our work proposes a generic algorithm that can deal with various weak consistency criteria and ADTs.

From the complexity standpoint, Gibbons and Korach [18] showed that monitoring even the single-value register type for linearizability is np-hard. Alur et al. [3] showed that checking linearizability of all executions of a given implementation is in expspace when the number of concurrent operations is bounded, and then Hamza [20] established expspace-completeness. Bouajjani et al. [5] showed that the problem becomes undecidable once the number of concurrent operations is unbounded. Also, Bouajjani et al. [7,8] investigate various ADTs for which the problems of checking eventual and causal consistency are decidable.

#### **7 Conclusion**

We have developed the first completely-automatic algorithm for checking weak consistency of arbitrary concurrent object implementations which avoids the na¨ıve enumeration of all possible visibility relations. While methodologies for constructing reliable yet weakly-consistent implementations are relatively immature, we believe that such implementations will continue to be important for the development of distributed and multicore software systems. Likewise, automation for testing and verifying such implementations is, and will increasingly be, important. Besides improving state-of-the-art verification algorithms, our results represent an important step for future research which may find other ways to exploit the soundness of considering only minimal visibilities, on which our optimized algorithm relies.

### **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

### **Monitoring CTMCs by Multi-clock Timed Automata**

Yijun Feng<sup>1</sup>, Joost-Pieter Katoen2(B) , Haokun Li1(B), Bican Xia1(B) , and Naijun Zhan3,4(B)

<sup>1</sup> LMAM and School of Mathematical Sciences, Peking University, Beijing, China ker@protonmail.ch, xbc@math.pku.edu.cn

<sup>2</sup> RWTH Aachen University, Aachen, Germany

katoen@cs.rwth-aachen.de

<sup>3</sup> State Key Laboratory of Computer Science, Institute of Software,

Chinese Academy of Sciences, Beijing, China

znj@ios.ac.cn

<sup>4</sup> University of Chinese Academy of Sciences, Beijing, China

**Abstract.** This paper presents a numerical algorithm to verify continuous-time Markov chains (CTMCs) against multi-clock deterministic timed automata (DTA). These DTA allow for specifying properties that cannot be expressed in CSL, the logic for CTMCs used by stateof-the-art probabilistic model checkers. The core problem is to compute the probability of timed runs by the CTMC C that are accepted by the DTA A. These likelihoods equal reachability probabilities in an embedded piecewise deterministic Markov process (EPDP) obtained as product of C and A's region automaton. This paper provides a numerical algorithm to efficiently solve the PDEs describing these reachability probabilities. The key insight is to solve an ordinary differential equation (ODE) that exploits the specific characteristics of the product EPDP. We provide the numerical precision of our algorithm and present experimental results with a prototypical implementation.

#### **1 Introduction**

Continuous-time Markov chains (CTMCs) [17] are ubiquitous. They are used to model safety-critical systems like communicating networks and power management systems, are key to performance and dependability analysis, and naturally describe chemical reaction networks. The algorithmic verification of CTMCs has received quite some attention. Aziz *et al.* [3] proved that verifying CTMCs against CSL (Continuous Stochastic Logic) is decidable. CSL is a probabilistic and timed branching-time logic that allows for expressing properties like "is the probability of a given chemical reaction within 50 time units at least 10−<sup>3</sup>?". Baier *et al.* [5] gave efficient numerical algorithms for CSL model checking that nowadays provide the basis of CTMC model checking in PRISM [23], MRMC [22] and Storm [15], as well as GreatSPN [2]. Extensions of CSL to cascaded timeduntil operators [27], conditional probabilities [19], and (simple) timed regular expressions [4] have been considered.

c The Author(s) 2018

H. Chockler and G. Weissenbacher (Eds.): CAV 2018, LNCS 10981, pp. 507–526, 2018. https://doi.org/10.1007/978-3-319-96145-3\_27

This paper considers the verification of CTMCs against *linear-time* real-time properties. These include relevant properties in the design of a gas burner [28], like "the probability that the duration of leaking is more than one twentieth over an interval with a length more than 20 s is less than 10−<sup>6</sup>". Such realtime properties can be conveniently expressed by deterministic timed automata (DTA) [1]. The core problem in the verification of CTMC C against DTA A is to compute the probability of C's timed runs that are accepted by A, i.e. Pr (C |= A). Chen *et al.* [10,11] showed that this quantity equals the reachability probability in a piecewise deterministic Markov process (PDP) [14]. This PDP is obtained by taking the product of CTMC C and the region automaton of A. Computing reachability probabilities in PDPs is a challenge.

Practical implementations of verifying CTMCs against DTA specifications are rare. Barbot *et al.* [7] showed that for *single-clock* DTA, the PDP is in fact a Markov regenerative process. (This observation is also at the heart of model-checking CSLT A [16].) This implies that for single-clock DTA, off-theshelf CSL model-checking algorithms can be employed resulting in an efficient procedure [7]. Mikeev *et al.* [24] generalised these ideas to infinite-state CTMCs obtained from stoichiometric equations, whereas Chen *et al.* [12] showed the theory to generalize verifying single-clock DTA to continuous-time Markov decision processes.

*Multi-clock* DTA are however much harder to handle. The characterisation of PDP reachability probabilities as the unique solution of a set of partial differential equations (PDEs) [10,11] does not give insight into an efficient computational procedure. With the notable exception of [25], verifying PDPs has not been considered. Fu [18] provided an algorithm to approximate the probabilities using finite difference methods and gave an error bound. This method hampers scalability and therefore was never implemented. The same holds for modelchecking using other linear-time real-time formalisms such as MTL and timed automata [9], linear duration invariants [8], and probabilistic duration calculus [13]. All these multi-clock approaches suffer from scalability issues due to the low efficiency of solving PDEs and/or integral equations on which they heavily depend.

This paper presents a numerical technique to approximate the reachability probability in the product PDP. The DTA A is approximated by DTA A[t<sup>f</sup> ] which extends A with an additional clock that is never reset and that needs to be at most t<sup>f</sup> when accepting. By increasing the time-bound t<sup>f</sup> , DTA A[t<sup>f</sup> ] approximates A arbitrarily closely. We show that the set of PDPs characterizing the reachability probability in the embedded PDP of C and A[t<sup>f</sup> ] can be reduced to solving an ordinary differential equation (ODE). The specific characteristics of the product EPDP, in particular the fact that all clocks run at the same pace, are key to obtain these ODEs. Our numerical algorithm to solve the ODEs is based on computing the approximations in a backward manner using t<sup>f</sup> and the sum of all clocks. The complexity of the resulting procedure is linear in the EPDP size, and exponential in <sup>t</sup>*<sup>f</sup>* <sup>δ</sup> where δ is the discretization step size. We show the approximations converges to the real solution of the ODEs at a linear speed of δ. Using a prototypical tool implementation we present some results on a number of case studies such as robot navigation with varying number of clocks in their specification. The experimental results show promising results for checking CTMCs against multi-clock DTA.

**Organization of the Paper.** Section 2 introduces basic notions including CTMCs, DTA, and PDPs. Section 3 presents the product of a CTMC and the region graph of a DTA and shows this is an embedded PDP. Section 4 derives the PDE (fixing some flaw in [10]), the reduction to the set of ODEs and presents the numerical algorithm to solve these ODEs. Section 5 presents the experimental results and Sect. 6 concludes.

#### **2 Preliminaries**

In this section, we introduce some basic notions which will be used later.

A probability space is denoted by a triple (Ω, F,*Pr*), where Ω is a set of samples, F is a σ-algebra over Ω, and *Pr* : F → [0, 1] is a probability measure on <sup>F</sup> with *Pr*(Ω) = 1. Let <sup>P</sup>r(Ω) denote the set of all probability measures over Ω. For a random variable X on the probability space, its expectation is denoted by E(X).

#### **2.1 Continuous-Time Markov Chain (CTMC)**

**Definition 1 (CTMC).** *A CTMC is a tuple* C = (S, **P**, α, *AP*, L, E)*, where*


We denote by s <sup>t</sup> −→ s a transition from state s to state s after residing in state s for t time units. The probability of the occurrence of this transition within t time units is **P**(s, s ) t <sup>0</sup> <sup>E</sup>(s) exp−E(s)<sup>x</sup> dx, where <sup>t</sup> <sup>0</sup> E(s) exp−E(s)<sup>x</sup> dx stands for the probability to leave state s in t time units, and **P**(s, s ) for the probability to select the transition to s from all transitions outgoing from s. A state s is called *absorbing* if **P**(s, s) = 1. Given a CTMC C, removing the exit rate function E results in a discrete-time Markov chain (DMTC), which is called *embedded* DTMC of C. A CTMC C is called *irreducible* if there exists a unique stationary distribution α, such that α(s) > 0 for all s ∈ S, and *weakly irreducible* if α(s) may be zero for some s ∈ S.

**Definition 2 (CTMC Path).** *Let* C *be a CTMC, a path* ρ *of* C *starting form* s<sup>0</sup> *with length* n *is a sequence* ρ = s<sup>0</sup> <sup>t</sup><sup>0</sup> −→ <sup>s</sup><sup>1</sup> <sup>t</sup><sup>1</sup> −→ ... <sup>t</sup>*n*−<sup>1</sup> −−−→ <sup>s</sup><sup>n</sup> <sup>∈</sup> <sup>S</sup> <sup>×</sup> (R<sup>&</sup>gt;<sup>0</sup> <sup>×</sup> <sup>S</sup>)<sup>n</sup>*. The* *set of paths in* C *with length* n *is denoted by Path*<sup>C</sup> <sup>n</sup>*; the set of all finite paths of* C *is Path*C *fin* = ∪n*Path*<sup>C</sup> <sup>n</sup> *and the set of infinite paths of* C *is Path*<sup>C</sup> *inf* = (<sup>S</sup> <sup>×</sup> <sup>R</sup>>0)ω*. We use Path*<sup>C</sup> = *Path*<sup>C</sup> *fin* ∪ *Path*<sup>C</sup> *inf to denote all paths in* C*. As a convention,* ε *stands for the empty path.*

Note that we assume the time to exit a state is strictly greater than 0. For an infinite path ρ, we use *Pref*(ρ) to denote the set of its finite prefixes. For a (finite or infinite) path ρ with prefix s<sup>0</sup> <sup>t</sup><sup>0</sup> −→ <sup>s</sup><sup>1</sup> <sup>t</sup><sup>1</sup> −→ ..., the trace of the path is the sequence of states *trace*(ρ) = s0s<sup>1</sup> .... Let ρ(n) = s<sup>n</sup> be the n-th state in the path and ρ[n] = t<sup>n</sup> be the corresponding exit time for sn. For a finite path ρ = s<sup>0</sup> <sup>t</sup><sup>0</sup> −→ <sup>s</sup><sup>1</sup> <sup>t</sup><sup>1</sup> −→ ... <sup>t</sup>*n*−<sup>1</sup> −−−→ sn, we use T(ρ) = n−1 <sup>i</sup>=0 t<sup>i</sup> to denote the total time spent on this path if n ≥ 1, otherwise T(ρ) = 0. For a time t ≤ T(ρ), ρ(0 ...t) denotes the prefix of ρ within t time units, i.e., s<sup>0</sup> <sup>t</sup><sup>0</sup> −→ <sup>s</sup><sup>1</sup> <sup>t</sup><sup>1</sup> −→ ... <sup>t</sup>*m*−<sup>1</sup> −−−→ s<sup>m</sup> if there exists some m ≤ n with m−1 <sup>i</sup>=0 ρ[m] ≤ t ∧ m <sup>i</sup>=0 ρ[m] > t, otherwise ε.

A basic cylinder set C(s0, I0, ··· , I<sup>n</sup>−1, sn) consists of all paths ρ ∈ *Path*<sup>C</sup> such that ρ(i) = s<sup>i</sup> for 0 ≤ i ≤ n, and ρ[i] ∈ I<sup>i</sup> for 0 ≤ i<n. Then the σ−algebra F<sup>s</sup><sup>0</sup> (C) associated with CTMC C and initial state s<sup>0</sup> is the smallest σ−algebra that contains all cylinder sets C(s0, I0, ··· , I<sup>n</sup>−1, sn) with α(s0) > 0, and **P**(si, s<sup>i</sup>+1) > 0, for 1 ≤ i ≤ n, and I0,...,I<sup>n</sup>−<sup>1</sup> are non-empty intervals in <sup>R</sup>≥<sup>0</sup>. There is a unique probability measure *Pr*<sup>C</sup> on the <sup>σ</sup>−algebra <sup>F</sup><sup>s</sup><sup>0</sup> (C), by which the probability for a cylinder set is given by

$$\Pr^{\mathcal{L}}(C(s\_0, I\_0, \cdots, I\_n, s\_n)) = \alpha(s\_0) \cdot \prod\_{i=1}^n \int\_{I\_i} E(s\_{i-1}) \exp^{-E(s\_{i-1})x} \, dx \cdot \mathbf{P}(s\_{i-1}, s\_i)$$

*Example 1.* An example of CTMC is shown in Fig. 1, with *AP* = {a, b, c} and initial state s0. The exit rate ri, i = 0, 1, 2, 3 and transition probability are shown in the figure.

**Fig. 1.** An example of CTMC

#### **2.2 Deterministic Timed Automaton (DTA)**

A timed automaton is a finite state graph equipped with a finite set of nonnegative real-valued clock variables, or clocks for short. Clocks can only be reset to zero, or proceed with rate 1 as time progresses independently. Let <sup>X</sup> <sup>=</sup> {x1,...,xn} be a set of clocks. <sup>η</sup>(x) : X → <sup>R</sup>≥<sup>0</sup> is a <sup>X</sup> -valuation which records the amount of time since its last reset. Let *Val*(A) be the set of all clock valuations of A. For a subset X ⊆ X , the reset of X, denoted as η[X := 0], is the valuation η such that η (x)=0, ∀x ∈ X, and η (x) = η(x), otherwise. For <sup>d</sup> <sup>∈</sup> <sup>R</sup>>0, (<sup>η</sup> <sup>+</sup> <sup>d</sup>)(x) = <sup>η</sup>(x) + <sup>d</sup> for any clock <sup>x</sup> ∈ X .

A clock constraint over X is a formula with the following form

$$\{g := x < c \mid x \le c \mid x > c \mid x \ge c \mid x - y \ge c \mid g \land g,\}$$

where <sup>x</sup>, <sup>y</sup> are clocks, <sup>c</sup> <sup>∈</sup> <sup>N</sup>. Let *Con*(<sup>X</sup> ) denote the set of clock constraints over X . A valuation η satisfies a guard g, denoted as η |= g, iff η(x) c when g is x c, where ∈ {<, ≤, >, ≥}; and η |= g<sup>1</sup> and η |= g<sup>2</sup> iff g = g<sup>1</sup> ∧ g2.

**Definition 3 (DTA).** *A DTA is a tuple* A = (Σ, X , Q, q0, Q<sup>F</sup> , →)*, where*


Each transition relation, or edge, q → q in A is endowed with (a, g, X), where a ∈ Σ is an action, g ∈ *Con*(X ) is the guard of the transition, and X ⊆ X is a set of clocks, which should be reset to 0 after the transition. An intuitive interpretation of the transition is that A can move from q to q by taking action a and resetting all clocks in X to be 0 only if g is satisfied. There are no outgoing transitions from any accepting location in Q<sup>F</sup> .

A finite timed path of A is of the form θ = q<sup>0</sup> a0,t<sup>0</sup> −−−→ q<sup>1</sup> a1,t<sup>1</sup> −−−→ ... <sup>a</sup>*n*−1,t*n*−<sup>1</sup> −−−−−−→ qn, where t<sup>i</sup> ≥ 0, for i = 0,...,n−1. Moreover, there exists a sequence of transitions qj a*<sup>j</sup>* ,g*<sup>j</sup>* ,X*<sup>j</sup>* −−−−−→ q<sup>j</sup>+1, for 0 ≤ j ≤ n − 1, such that η<sup>0</sup> = **0**, η<sup>j</sup> + t<sup>j</sup> |= g<sup>j</sup> and η<sup>j</sup>+1 = η<sup>j</sup> [X<sup>j</sup> := 0], where η<sup>k</sup> denotes the clock valuation when entering qk. θ is said to be *accepted by* A if there exists a state q<sup>i</sup> ∈ Q<sup>F</sup> for some 0 ≤ i ≤ n. As normal, it is assumed all DTA are non-Zeno [6], that is any circular transition sequence takes nonzero dwelling time.

A region is a set of valuations, usually represented by a set of clock constraints. Let *Reg*(X ) be the set of regions over X . Given Θ, Θ ∈ *Reg*(X ), Θ is called a *successor* of Θ if for all η |= Θ, there exists t > 0 such that η + t |= Θ and ∀t < t, η + t |= Θ ∨ Θ . A region Θ satisfies a guard g, denoted as Θ |= g, iff ∀η |= Θ implies η |= g. The reset operation on a region Θ is defined as Θ[X := 0] = {η[X := 0] | η |= Θ}. Then the region graph, viewed as a quotient transition system related to clock equivalence [6] can be defined as follows:

**Definition 4 (Region Graph).** *The region graph for DTA* A = (Σ, X , Q, q0, Q<sup>F</sup> , →) *is a tuple* G(A)=(Σ, X , Q, q0, Q<sup>F</sup> , →)*, where*

	- (q, Θ) a,X −−→ (q , Θ) *if there exists* g ∈ *Con*(X ) *and transition* q a,g,X −−−→ q *such that* Θ |= g *and* Θ = Θ[X := 0]*.*

*Example 2 (Adapted from* [10]*).* Figure 2 presents an example of DTA and Fig. 3 gives its region graph, in which double circle and double rectangle stand for final states, respectively.

**Fig. 2.** A DTA A **Fig. 3.** The region graph of A

#### **2.3 Piecewise-Deterministic Markov Process (PDP)**

Piecewise-deterministic Markov Processes (PDPs for short) [14] cover a wide range of stochastic models in which the randomness appears as discrete events at fixed or random times, whose evolution is deterministically governed by an ODE system between these times. A PDP consists of a mixture of deterministic motion and random jumps between a finite set of locations. During staying in a location, a PDP evolves deterministically following a flow function, which is a solution to an ODE system. A PDP can jump between locations either randomly, in which case the residence time of a location is governed by an exponential distribution, or when the location invariant is violated. The successor state of the jump follows a probability measure depending on the current state. A PDP is right-continuous and has the strong Markov property [14].

**Definition 5 (PDP** [14]**).** *A PDP is a tuple* Q = (Z, X ,*Inv*, φ, Λ, μ) *with*


For any <sup>ξ</sup> = (z,η) <sup>∈</sup> <sup>S</sup>, there is an <sup>δ</sup>(ξ) <sup>&</sup>gt; 0 such that <sup>Λ</sup>(z, φ(z, η, t)) is integrable on [0, δ(ξ)). <sup>μ</sup>(ξ)(A) is measurable for any <sup>A</sup> ∈ F(S), where <sup>F</sup>(S) is the smallest σ−algebra generated by { <sup>z</sup>∈<sup>Z</sup> <sup>z</sup> <sup>×</sup> <sup>A</sup>z|A<sup>z</sup> ∈ F(*Inv*(z))} and <sup>μ</sup>(ξ)({ξ}) = 0.

There are two ways to take transitions between locations in PDP Q. A PDP Q is allowed to stay in a current location z only if *Inv*(z) is satisfied. During its residence, the valuation η evolves time-dependently according to the flow function. Let ξ ⊕ t = (z, φ(z, η, t)) be the successor state of ξ = (z,η) after residing t time units in z. Thus, Q is piecewise-deterministic since its behavior is determined by the flow function φ in each location. In a state ξ = (z,η) with <sup>η</sup> <sup>|</sup><sup>=</sup> *Inv*(z)<sup>o</sup>, the PDP <sup>Q</sup> can either evolve to a state <sup>ξ</sup> <sup>=</sup> <sup>ξ</sup>⊕<sup>t</sup> by delaying <sup>t</sup> time units, or take a Markovian jump to <sup>ξ</sup> = (z, η) <sup>∈</sup> <sup>S</sup> with probability <sup>μ</sup>(ξ)({ξ}). When <sup>η</sup> <sup>|</sup><sup>=</sup> <sup>∂</sup>*Inv*(z), <sup>Q</sup> is forced to take a boundary jump to <sup>ξ</sup> = (z, η) <sup>∈</sup> <sup>S</sup> with probability μ(ξ)({ξ}).

#### **3 Reduction to the Reachability Probability of EPDP**

As proved in [10], model-checking of a given CTMC C against a linear real-time property expressed by a DTA A, i.e., determining *Pr*(C |= A), can be reduced to computing the reachability probability of the product of C and G(A). This can be further reduced to computing the reachability probability of the embedded PDP (EPDP) of the product. But how to efficiently compute the reachability probability of the EPDP still remains challenging, as existing approaches [7,10, 16] can only handle DTA with one clock. We will attack this challenge in this paper. For self-containedness, we reformulate the reduction reported in [10] in this section.

A path ρ = s<sup>0</sup> <sup>t</sup><sup>0</sup> −→ <sup>s</sup><sup>1</sup> <sup>t</sup><sup>1</sup> −→ ... of CTMC <sup>C</sup> is accepted by DTA <sup>A</sup> if ˆ<sup>ρ</sup> <sup>=</sup> q0 L(s0),t<sup>0</sup> −−−−−→ q<sup>1</sup> L(s1),t<sup>1</sup> −−−−−→ ... <sup>L</sup>(s*n*−1),t*n*−<sup>1</sup> −−−−−−−−→ q<sup>n</sup> induced by some ρ's prefix is an accepting path of A. Then *Pr*(C |= A) = *Pr*{ρ ∈ *Path*<sup>C</sup> | ρ is accepted by A}.

**Definition 6 (Product Region Graph** [7]**).** *The product of CTMC* C = (S, **P**, α, *AP*, L, E) *and the region graph of DTA* G(A)=(Σ, X , Q, q0, Q<sup>F</sup> , →)*, denoted by* C⊗G(A)*, is a tuple* (X ,V,α , V<sup>F</sup> ,, Λ)*, where*


*–* <sup>Λ</sup> : <sup>V</sup> <sup>→</sup> <sup>R</sup>><sup>0</sup> *is the exit rate function, where*

$$A(s,\overline{q}) = \begin{cases} E(s) \text{ if there exists a Markovian transition from } (s,\overline{q}):\\ 0 \quad \text{otherwise} \end{cases}$$

*Remark 1.* Note that the definition of region graph here is slightly different from the usual one in the sense that Markovian transitions starting from a boundary do not contribute to the reachability probability. Therefore we can merge the boundary into its unique delay successor.

*Example 3 (Adapted from* [10]*).* Figure 4 shows the product region graph of CTMC C in Example 1 and DTA A in Example 2. The graph can be split into three subgraphs in a column-wise manner, where all transitions within a subgraph are probabilistic, all transitions evolve to the next subgraph are delay transitions, and transitions with reset lead to a state in the first subgraph. For conciseness, the location v<sup>9</sup> stands for all nodes that may be reached by a Markovian transition yet cannot reach an accepting node.

**Proposition 1 (**[10]**).** *For CTMC* C *and DTA* A*, Pr*(C |= A) *is measurable and Pr*(C |<sup>=</sup> <sup>A</sup>) = *Pr*C⊗G(A) {*Path*C⊗G(A) (♦Q<sup>F</sup> )}.

**Fig. 4.** Product region graph C⊗G(A) of CTMC C in Example 1 and DTA A in Example 2

When treated as a stochastic process, C⊗G(A) can be interpreted as a PDP. In this way, computing the reachability probability of Q<sup>F</sup> in C⊗G(A) can be reduced to computing the time-unbounded reachability probability in the EPDP of C⊗G(A).

**Definition 7 (EPDP,** [7]**).** *Given* C⊗G(A)=(X ,V,α , V<sup>F</sup> ,, Λ)*, the EPDP* QC⊗A *is a tuple* (X ,V,*Inv*, φ, Λ, μ) *where for any* v = (s,(q, Θ)) ∈ V


The flow function here describes that all clocks increase with a uniform rate (i.e., ˙x<sup>1</sup> = 1,..., <sup>x</sup>˙<sup>n</sup> = 1, or simply <sup>X</sup>˙ = 1) at all locations. The original reachability problem is then reduced to the reachability probability of the set {(v, *η*) | v ∈ V<sup>F</sup> , *η* |= *Inv*(v)}, given the initial state (v0, **0**) and the EPDP <sup>Q</sup>C⊗A. Let *Pr*QC⊗A <sup>v</sup> (*η*) stand for the probability to reach the final states (V<sup>F</sup> × ∗) from (v, *<sup>η</sup>*) in <sup>Q</sup>C⊗A. Thus, *Pr*QC⊗A <sup>v</sup> (*η*) can be computed recursively by

$$Pr\_{v}^{\mathcal{Q}^{\mathcal{C}\otimes\mathcal{A}}}(\eta) = \begin{cases} Pr\_{v,\lambda}^{\mathcal{Q}^{\mathcal{C}\otimes\mathcal{A}}}(\eta) + \sum\_{v \xrightarrow{p,X\_{\mathcal{I}}} v'} Pr\_{v,v'}^{\mathcal{Q}^{\mathcal{C}\otimes\mathcal{A}}}(\eta) \text{ if } v \notin V\_F\\ 1, & v \in V\_F \wedge \eta \mid = Inv(v) \\ 0, & \text{otherwise.} \end{cases} \tag{1}$$

Let t ∗ <sup>z</sup>(v, *η*) denote the minimal time for QC⊗A to reach ∂*Inv*(v) from (v, *η*). More precisely,

$$t\_z^\*(v, \eta) = \inf\{t \mid \phi(v, \eta, t) \mid = \operatorname{Inv}(v)\}.$$

*Pr*QC⊗A v,λ (*η*) is the probability from (v, η) with a delay and then a forced jump to (v , η +t ∗ <sup>z</sup>(v, *η*)), onwards evolves to an accepting state, which can be recursively computed by

$$\operatorname{Pr}\_{v,\lambda}^{\mathcal{Q}^{\mathcal{C}\otimes\mathcal{A}}}(\eta) = \exp(-\operatorname{A}(v)t\_z^\*(v,\eta)) \cdot \operatorname{Pr}\_{v'}^{\mathcal{Q}^{\mathcal{C}\otimes\mathcal{A}}}(\eta + t\_z^\*(v,\eta)).$$

*Pr*QC⊗A v,v- (*η*) is the probability that a Markovian transition v p,X −− v happens within t ∗ <sup>z</sup>(v, *η*) time units, onwards involves to an accepted state, which can be recursively computed by

$$\operatorname{Pr}\_{v,v'}^{\mathcal{Q}^{\mathcal{C}\otimes\mathcal{A}}}(\eta) = \int\_0^{t\_x^\*(v,\eta)} p \cdot A(v) \exp(-A(v)s) \cdot \operatorname{Pr}\_{v'}^{\mathcal{Q}^{\mathcal{C}\otimes\mathcal{A}}}(\eta + s[X := 0]) \, ds \dots$$

*Pr*(C |<sup>=</sup> <sup>A</sup>) is reduced to compute *Pr*QC⊗A <sup>v</sup><sup>0</sup> (**0**), equivalent to computing the least fixed point of the Eq. (1). That is,

**Theorem 1.** [10] *For CTMC* C *and DTA* A*, Pr*(C |= A) = *Pr*C⊗A {*Path*C⊗A(♦Q<sup>F</sup> )} *is the least fixed point of (1).*

*Remark 2.* Generally, it is difficult to solve a recursive equation like (1). As an alternative, we discuss the augmented EPDP of QC⊗A by replacing A with a bounded DTA resulting from A. As a consequence, using the extended generator of the augmented EPDP, we can induce a partial differential equation (PDE) whose solution is the reachability probability. We will elaborate the idea in the subsequent section.

### **4 Approximating the Reachability Probability of EPDP**

In this section, we present a numerical method to approximate *Pr*QC⊗A <sup>v</sup><sup>0</sup> (**0**), as we discussed previously that exactly computing is impossible, at least too expensive, in general. We will first introduce the basic idea of our approach in detail, then discuss its time complexity and convergence property. A key point is that our approach exploits the observation that the flow function of QC⊗A is linear, only related to time t, and remains the same at all locations. This enables to reduce computing *Pr*QC⊗A <sup>v</sup><sup>0</sup> (**0**) to solving an ODE system.

#### **4.1 Reduction to a PDE System**

In this subsection, we first show that *Pr*QC⊗A <sup>v</sup><sup>0</sup> (**0**) can be approximated by that of the EPDP of C and a bounded DTA derived from A, i.e., the length of all its paths is bounded. Then show that the latter can be reduced to solving a PDE system.

Given a DTA A, we construct a bounded DTA A[t<sup>f</sup> ] by introducing a new clock y, adding a timing constraint y<t<sup>f</sup> to the guard of each transition of A ingoing to an accepting state in <sup>Q</sup><sup>F</sup> , and never resetting <sup>y</sup>, where <sup>t</sup><sup>f</sup> <sup>∈</sup> <sup>N</sup> is a parameter. So, the length of all accepting paths of A[t<sup>f</sup> ] is time-bounded by t<sup>f</sup> . Obviously, *Path*C(A[t<sup>f</sup> ]) is a subset of *Path*C(A). As *Pr*(C |= A) is measurable and QC⊗A is Borel right continuous, we have the following proposition.

**Proposition 2.** *Given a CTMC* <sup>C</sup>*, a DTA* <sup>A</sup>*, and* <sup>t</sup><sup>f</sup> <sup>∈</sup> <sup>N</sup>*,*

$$\lim\_{t\_f \to \infty} Pr(\mathcal{C} \mid = \mathcal{A}[t\_f]) = Pr(\mathcal{C} \mid = \mathcal{A}). \tag{2}$$

*Moreover, if* C *is weakly irreducible or satisfies some conditions (please refer to Chap. 4 of* [26] *for details), then there exist positive constants* K, K<sup>0</sup> <sup>∈</sup> <sup>R</sup>≥<sup>0</sup> *such that*

$$\Pr(\mathcal{C} \mid \mathcal{A}) - \Pr(\mathcal{C} \mid = \mathcal{A}[t\_f]) \le K \exp\{-K\_0 t\_f\}.\tag{3}$$

*Remark 3.* (2) was first observed in [7], thereof the authors pointed out the feasibility of using a bounded system to approximate the original unbounded system in order to simplify a verification obligation. (3) further indicates that such approximation is exponentially convergent w.r.t. −t<sup>f</sup> if the CTMC is weakly irreducible.

For a path starting in a state (v, *η*) at time y, we use *Path*<sup>y</sup> (v,*η*)[t] to denote the set of its locations at time t, and <sup>v</sup>(y, *η*) = *Pr*(*Path*<sup>y</sup> (v,*η*)[t<sup>f</sup> ] ∈ V<sup>F</sup> ) = E(**1***Path<sup>y</sup>* (*v,<sup>η</sup>* )[t*<sup>f</sup>* ]∈V*<sup>F</sup>* ) as the probability of a path reaching V<sup>F</sup> within t<sup>f</sup> time units, where **1***Path<sup>y</sup>* (*v,<sup>η</sup>* )[t*<sup>f</sup>* ]∈V*<sup>F</sup>* is the indicator function of *Path*<sup>y</sup> (v,*η*)[t<sup>f</sup> ] ∈ V<sup>F</sup> . Then, <sup>v</sup><sup>0</sup> (0, **0**) = *Pr*(C |= A[t<sup>f</sup> ]) is the probability to reach the set of accepting states from the initial state (0, **0**), which satisfies the following system of PDEs.

**Theorem 2.** *Given a CTMC* C*, a bounded DTA* A[t<sup>f</sup> ]*, and the EPDP* <sup>Q</sup>C⊗G(A[t*<sup>f</sup>* ]) = (<sup>X</sup> ,V,*Inv*, φ, Λ, μ)*,* <sup>v</sup><sup>0</sup> (0, **0**) *is the unique solution of the following system of PDEs:*

$$\frac{\partial \hbar\_v(y,\eta)}{\partial y} + \sum\_{i=1}^{|\mathcal{X}|} \frac{\partial \hbar\_v(y,\eta)}{\partial \eta^{(i)}} + A(v) \cdot \sum\_{v \xrightarrow{p, X\_\Delta} v'} p \cdot (\hbar\_{v'}(y, \eta[X:=0]) - \hbar\_v(y, \eta)) = 0,\tag{4}$$

*where* <sup>v</sup> <sup>∈</sup> <sup>V</sup> \V<sup>F</sup> , *<sup>η</sup>* <sup>|</sup><sup>=</sup> *Inv*(v), *<sup>η</sup>*(i) *is the* <sup>i</sup>*-th clock variable and* <sup>y</sup> <sup>∈</sup> [0, t<sup>f</sup> )*. The boundary conditions are:*

*(i)* <sup>v</sup>(y, *η*) = v- (y, *η*)*, for every η* |= ∂*Inv*(v) *and transition* v λ −→ v *; (ii)* <sup>v</sup>(y, *η*)=1*, for every vertex* v ∈ V<sup>F</sup> *, η* |= *Inv*(v)*, and* y ∈ [0, t<sup>f</sup> )*; (iii)* <sup>v</sup>(t<sup>f</sup> , *η*)=0*, for every vertex* v ∈ V \V<sup>F</sup> *and η* |= *Inv*(v) ∪ ∂*Inv*(v)*.*

*Remark 4.* Note that the PDE system (4) in Theorem 2 is different from the one presented in [10] for reducing *Pr*QC⊗A <sup>v</sup><sup>0</sup> (**0**). In particular, the boundary condition in [10] has been corrected here.

#### **4.2 Reduction to an ODE System**

There are several classical methods to solve PDEs. *Finite element method*, which is a numerical technique for solving PDEs as well as integral equations, is a prominent one, of which different versions have been established to solve different PDEs with specific properties. Other numerical methods include finite difference method and finite volume method and so on, the reader is referred to [20,21] for details. Thanks to the special form of the Eq. (4), we are able to obtain a numerical solution in a more efficient way.

The fact that the flow function (which is the solution to the ODE system <sup>x</sup>∈X <sup>x</sup>˙ = 1 <sup>∧</sup> <sup>y</sup>˙ = 1) is the same at all locations of the EPDP <sup>Q</sup>C⊗A[t*<sup>f</sup>* ] suggests that the partial derivatives of *η* and y in the left side of (4) evolve with the same pace. Thus, we can view all clocks as an array, and reformulate (4) as

$$\begin{aligned} \left[\frac{\partial \hbar\_v(y,\eta)}{\partial y}, \frac{\partial \hbar\_v(y,\eta)}{\partial \eta^{(1)}}, \dots, \frac{\partial \hbar\_v(y,\eta)}{\partial \eta^{(|X|)}}\right] \bullet \mathbf{1} \\ + A(v) \cdot \sum\_{v^{\underline{p,X}} = v'} p \cdot (\hbar\_{v'}(y,\eta[X:=0]) - \hbar\_v(y,\eta)) = 0, \quad (5) \end{aligned}$$

where • stands for the inner product of two vectors of the same dimension, e.g.,

(a1,...,an) • (b1,...,bn) = n <sup>i</sup>=1 aibi, and **1** for the vector ( n times 1,..., 1).

By Theorem 2, there exist v0, y<sup>0</sup> and η<sup>0</sup> such that v<sup>0</sup> ∈ V<sup>F</sup> , y<sup>0</sup> = t<sup>f</sup> , and *<sup>η</sup>*<sup>0</sup> <sup>|</sup><sup>=</sup> *Inv*(v) <sup>∨</sup> <sup>∂</sup>*Inv*(v). Besides, by the definition of <sup>Q</sup>C⊗A[t*<sup>f</sup>* ] , it follows ∂z ∂t = 1, which implies dz = dt, for any z ∈ {y}∪X . Hence, we can simplify (5) as the following ODE system:

$$\begin{aligned} \frac{d\hbar\_v((y\_0,\eta\_0)+t)}{dt} + \Lambda(v) \cdot \\ \sum\_{v \xrightarrow{p,X\_\lambda} v'} p \cdot (\hbar\_{v'}((y\_0,\eta\_0)+t)[X:=0]) - \hbar\_v(y\_0,\eta\_0)) = 0, \end{aligned} (6)$$

with the initial condition v<sup>0</sup> ∈ V<sup>F</sup> , y<sup>0</sup> = t<sup>f</sup> , and *η*<sup>0</sup> |= *Inv*(v) ∨ ∂*Inv*(v), where v ∈ V \V<sup>F</sup> . Note that we compute the reachability probability by (6) backwards.

#### **4.3 Numerical Solution**

Since <sup>v</sup>((y0, *η*0) + t) satisfies an ODE equation, we can apply a discretization method to (6) and obtain an approximation efficiently. To this end, the remaining obstacle is how to deal with the reset part v- (y<sup>0</sup> + t,(*η*<sup>0</sup> + t)[X := 0]). Notice that X = ∅ ⇒ sum((*η*<sup>0</sup> +t)[X := 0])+ (t<sup>f</sup> −y<sup>0</sup> −t)) < sum(*η***<sup>0</sup>** +t)+(t<sup>f</sup> −t<sup>0</sup> −t), where sum(*η*) = - <sup>x</sup>∈X *<sup>η</sup>*(x). So we just need to solve the ODE system starting from (t<sup>f</sup> , *η*0) using the descending order over sum(*η*) in a backward manner. In this way, all of the reset values needed for the current iteration have been computed in the previous iterations. Therefore for each iteration, the derivation is fixed and easy to calculate.

We denote by δ the length of discretization step, the number of total discretization steps is <sup>t</sup>*<sup>f</sup>* <sup>δ</sup> ∈ <sup>N</sup>. An approximate solution to (4) can be computed efficiently by the following algorithm.

Line 4 in Algorithm 1 computes a numerical solution to (6) on [t<sup>f</sup> − t, t<sup>f</sup> ] by discretizing <sup>d</sup>*<sup>v</sup>*((y0,*η* <sup>0</sup>)+t) dt with <sup>1</sup> δ (<sup>v</sup>((y0, *<sup>η</sup>*0)+(<sup>t</sup> <sup>+</sup> <sup>δ</sup>)) <sup>−</sup> <sup>v</sup>((y0, *η*0) + t)). A pictorial illustration to Algorithm 1 for the two-dimensional setting is shown in Fig. 5. The blue polyhedron covers all the points we need to calculate. The algorithm starts from (0, 0, t<sup>f</sup> ), where sum(*η*) = x<sup>1</sup> + x<sup>2</sup> = 0. Then sum(*η*) is incremented until 2t<sup>f</sup> in a stepwise manner. For each fixed sum(*η*), for example sum(*η*) = t<sup>f</sup> , the algorithm calculates all discrete points in the gray plane following the direction (−1, −1, −1), and finally reaches the two reset lines. The red line reaching the origin provides the final result.

**Algorithm 1.** Finding numerical solution to (4)

**Input:** C⊗G(A), the region graph of the product of CTMC <sup>C</sup> and DTA <sup>A</sup>; <sup>t</sup>*f* , the time bound

**Output:** A numerical solution for *<sup>v</sup>*<sup>0</sup> (0, **<sup>0</sup>**), an approximation of *Pr*(C |<sup>=</sup> <sup>A</sup>[t*<sup>f</sup>* ])


**Fig. 5.** Illustrating Algorithm 1 (left) and Algorithm 2 (right) for the 2-dimensional setting (Color figure online)

*Example 4.* Consider the product C⊗G(A) shown in Example 3 (in page 8). For state v<sup>3</sup> in which clock x is 1 and y is arbitrary, the corresponding PDE is

$$\frac{\partial \hbar\_{v\_3}(y,1)}{\partial y} + \frac{\partial \hbar\_{v\_3}(y,1)}{\partial x} + r\_0[0.5 \cdot \hbar\_{v\_0}(y,0) + 0.2 \cdot \hbar\_{v\_4}(y,0) + 0.4 \cdot \hbar\_{v\_3}(y,0) - \hbar\_{v\_3}(y,0)] = 0.5$$

Since sum(y, 0) = y<y + 1 = sum(y, 1), the value for <sup>v</sup><sup>0</sup> (y, 0), <sup>v</sup><sup>4</sup> (y, 0) and <sup>v</sup><sup>3</sup> (y, 0) have been calculated in the previous iterations, thus the value for <sup>v</sup><sup>3</sup> (y, 1) can be computed.

To optimize Algorithm 1 for multi-clock objects, we exploit the idea of "lazy computation". In Algorithm 1, in order to determine the reset part for (6), we calculate all discretized points generated by all ODEs. The efficiency is influenced since the amount of ODEs is quite large (the same as the number of states in product automaton). However in Algorithm 2, we only compute the reset part that we need for computing <sup>v</sup><sup>0</sup> (0, **0**). If we meet a reset part <sup>v</sup>(y, *η*[X := 0]) which has not been decided yet, we suspend the equation we are computing now and switch to compute the equation leading to the undecided point following the direction of (−1,..., −1). The algorithm terminates since the number of points it computes is no more than that of Algorithm 1. A pseudo-code is described in Algorithm 2.

**Algorithm 2.** The lazy computation to find numerical solution to (4)

```
Input: C⊗G(A), the region graph of the product of CTMC C and DTA A; tf , the time bound
Output: A numerical solution for -
                                 v0 (0, 0), an approximation of Pr(C |= A[tf ])
Procedure dhv(y, η) //Computing numerical solution for (y, η)
1: for t from 0 down to − min(tf , η) by δ do
2: for v ∈ V do
3: Check if η satisfies initial and boundary condition from Theorem 2
4: for each Markovian transition v p,X
                                      −−− v-
                                             do
5: up = (−t − δ) · 1 + ((t + δ) · 1)[X := 0]
6: if reset exists and η[X := 0] + up is undecided then
7: call dhv(tf , η[X := 0] + up)
8: end if
9: comput hv
10: end for
11: end for
12: execute λ−transition according to Theorem 2
13: compute -
                v((y0, η0) + t) by equation (6)
14: end for
15: mark η decided
End Procedure
1: Call dhv(v0, tf , (tf ))
2: return numerical solution for -
                                 v0 (0, 0)
```
#### **4.4 Complexity Analysis**

Let |S| be the number of the states of the CTMC, and n the number of the clocks of the DTA. The worst-case time complexity of Algorithms 1 and 2 lies in O(|V |· <sup>t</sup>*<sup>f</sup>* <sup>δ</sup> (n+1)), where <sup>|</sup><sup>V</sup> <sup>|</sup> is the number of the equations in (4), i.e., the number of the locations in the product region graph, that are not accepting. The number of states in the region graph of the DTA is bounded by <sup>n</sup>! · <sup>2</sup><sup>n</sup>−<sup>1</sup> · <sup>x</sup>∈X (c<sup>x</sup> + 1), denoted by <sup>C</sup>b, where <sup>c</sup><sup>x</sup> is the maximum constant occurring in the guards that constrain x. Note that C<sup>b</sup> differs from the bound given in [1], since the boundaries of a region do not matter in our setting and hence can be merged into the region. Thus, the number of states in the product region graph, as well as the number of PDE equations in Theorem 2, is at most C<sup>b</sup> · |S|. So the total complexity is O(C<sup>b</sup> · |S|· <sup>t</sup>*<sup>f</sup>* <sup>δ</sup> (n+1)).

Let v,n(y0, *η*0) denote the numerical solution to ODE (6) with t = −nδ, and Λmax = max{Λ(vi) | 0 ≤ i ≤ |S|}. Let N = <sup>t</sup>*<sup>f</sup>* <sup>δ</sup> . By Proposition 2, lim <sup>t</sup>*f*→+<sup>∞</sup> <sup>v</sup>(0, **<sup>0</sup>**) = *Pr*(C |<sup>=</sup> <sup>A</sup>) and <sup>v</sup>(0, **0**) is monotonically increasing for t<sup>f</sup> . In the following proposition, for simplicity of discussion, we assume t<sup>f</sup> equal to Nδ. Then, the error caused by discretization can be estimated as follows:

**Proposition 3.** *For* <sup>N</sup> <sup>∈</sup> <sup>N</sup><sup>+</sup> *and* <sup>δ</sup> <sup>=</sup> <sup>t</sup>*<sup>f</sup>* N ,

$$|\hbar\_{v\_0,N}(t\_f, t\_f \cdot \mathbf{1}) - \hbar\_{v\_0}(0, \mathbf{0})| = \mathcal{O}(\delta).$$

For function f(δ), f is of the magnitude O(δ) if lim δ→0 f(δ) δ <sup>=</sup> <sup>C</sup>, where <sup>C</sup> is a constant. From Proposition 3, if we view Λmax and t<sup>f</sup> as constants, then the error is O(δ) to the step length δ. By Proposition 2, the numerical solution generated by Algorithm 1 converges to the reachability probability of C⊗A, and the error can be as small as we expect if we decrease the size of discretization δ, and increase the time bound t<sup>f</sup> .

#### **5 Experimental Results**

We implemented a prototype including Algorithms 1 and 2 in C and a tool taking a CTMC C and a DTA A as input and generating a .c file to store their product in Python, which is used as an input to Algorithms 1 and 2. The first two examples (Examples 5 and 6) come from [10] to show the feasibility of our tool. The last case study is an example of robot navigation from [7]. In order to demonstrate the scalability of our approach, we revise the example with different real-time requirements, which require DTA with different number of clocks. The examples are executed in Linux 16.04 LTS with Intel(R) Core(TM) i7-4710HQ 2.50 GHz CPU and 16 G RAM. The column "time" reports the running time for Algorithm 1, and "time (lazy)" reports the running time for Algorithm 2. All time is counted in seconds.

*Example 5.* Consider Example 3 with r<sup>i</sup> = 1, i = 0,... 3 and δ = 0.01, experimental result is shown in Table 1. The relevant error when t<sup>f</sup> = 30 and t<sup>f</sup> = 40 is 5 <sup>×</sup> <sup>10</sup>−<sup>7</sup>.


**Table 1.** The experimental results for Examples 5 and 6

*Example 6.* Consider the reachability probability for the product of a CTMC and a DTA as shown in Fig. 6. A part of its region graph is shown in Fig. 7. Set r<sup>0</sup> = r<sup>1</sup> = 1, δ = 0.1, the experimental result is given in Table 1. The relevant error when <sup>t</sup><sup>f</sup> = 30 and <sup>t</sup><sup>f</sup> = 40 is 1 <sup>×</sup> <sup>10</sup>−<sup>7</sup>. Note that even for this simple example, none of existing tools can handle it.

**Fig. 6.** The product automaton of Example 6 **Fig. 7.** The reachable product region graph of Fig. 6.

*Example 7.* Consider a robot moves on a N ×N grid as shown in Fig. 8 (adapted from [7]). It can move up, down, left and right. For each possible direction, the robot moves with the same probability. The cells are grouped with A, B, C and D. We consider the following real-time constraints:


In this example, we are verifying whether the CTMC satisfies (i) P1; (ii) P1∧P2; (iii) P<sup>1</sup> ∧ P<sup>2</sup> ∧ P3. Obviously, P<sup>1</sup> can be expressed by a DTA with one clock, see Fig. 9; to express P<sup>1</sup> ∧ P2, a DTA with two clocks is necessary, see Fig. 10; to express P<sup>1</sup> ∧ P<sup>2</sup> ∧ P3, A DTA with three clocks is necessary, see Fig. 11.

**Fig. 8.** An example grid **Fig. 9.** A DTA with one clock for P<sup>1</sup>

The experimental results are summarized in Table 2. The relevant error of t<sup>f</sup> = 20 and t<sup>f</sup> = 21 is smaller than 10−<sup>2</sup>. As can be seen, the running time of our approach heavily depends on the number of clocks. Compared with the

**Fig. 10.** A DTA with two clocks for P<sup>1</sup> ∧ P<sup>2</sup>

**Fig. 11.** A DTA with three clocks for P<sup>1</sup> ∧ P<sup>2</sup> ∧ P<sup>3</sup>

**Table 2.** Experimental results for the robot example with δ = 0.1, running time longer than 2700 s is denoted by 'TO' (timeout), the column "#(P)" counts the number of states in the product automaton C⊗G(A), "time([7])" is the running time of prototype in [7] when precision = 0.01, T<sup>1</sup> = T<sup>2</sup> = 3, T<sup>3</sup> = 5, T<sup>4</sup> = 7


results reported in [7] for the case of one clock in this case study (when the precision is set to be 10−<sup>2</sup>), our result is as fast as theirs, but their tool cannot handle the cases of multiple clocks. In contrast, our approach can handle DTA with multiple clocks as indicated in the verification of P<sup>2</sup> and P3. Algorithm 2 is much more faster than Algorithm 1 when the number of clocks grows up. To the best of our knowledge, this is the first prototypical tool verifying CTMCs against multi-clock DTA.

#### **6 Concluding Remarks**

In this paper, we present a practical approach to verify CTMCs against DTA objectives. First, the desired probability can be reduced to the reachability probability of the product region graph in the form of PDPs. Then we use the augmented PDP to approximate the reachability probability, in which the reachability probability coincides with the solution to a PDE system at the starting point. We further propose a numerical solution to the PDE system by reduction it to a ODE system. The experimental results indicate the efficiency and scalability compared with existing work, as it can handle DTA with multiple clocks.

As a future work, it deserves to investigate whether our approach also works in the verification of CTMCs against more complicated real-time properties, either expressed by timed automata and MTL as considered in [9], or by linear duration invariants as considered in [8].

**Acknowledgements.** This research is partly funded by the Sino-German Center for Research Promotion as part of the project CAP (GZ 1023), from Yijun Feng, Haokun Li and Bican Xia is partly funded by NSFC under grant No. 61732001 and 61532019, from Joost-Pieter Katoen is partly funded by the DFG Research Training Group 2236 UnRAVeL, from Naijun Zhan is funded partly by NSFC under grant No. 61625206 and 61732001, by "973 Program" under grant No. 2014CB340701 and by the CAS/SAFEA International Partnership Program for Creative Research Teams.

#### **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

### **Start Pruning When Time Gets Urgent: Partial Order Reduction for Timed Systems**

Frederik M. Bønneland, Peter Gjøl Jensen, Kim Guldstrand Larsen, Marco Mu˜niz, and Jiˇr´ı Srba(B)

Department of Computer Science, Aalborg University, Aalborg, Denmark {frederikb,pgj,kgl,muniz,srba}@cs.aau.dk

**Abstract.** Partial order reduction for timed systems is a challenging topic due to the dependencies among events induced by time acting as a global synchronization mechanism. So far, there has only been a limited success in finding practically applicable solutions yielding significant state space reductions. We suggest a working and efficient method to facilitate stubborn set reduction for timed systems with urgent behaviour. We first describe the framework in the general setting of timed labelled transition systems and then instantiate it to the case of timed-arc Petri nets. The basic idea is that we can employ classical untimed partial order reduction techniques as long as urgent behaviour is enforced. Our solution is implemented in the model checker TAPAAL and the feature is now broadly available to the users of the tool. By a series of larger case studies, we document the benefits of our method and its applicability to real-world scenarios.

#### **1 Introduction**

Partial order reduction techniques for untimed systems, introduced by Godefroid, Peled, and Valmari in the nineties (see e.g. [6]), have since long proved successful in combating the notorious state space explosion problem. For *timed* systems, the success of partial order reduction has been significantly challenged by the strong dependencies between events caused by time as a global synchronizer. Only recently—and moreover in combination with *approximate* abstraction techniques—stubborn set techniques have demonstrated a true reduction potential for systems modelled by timed automata [23].

We pursue an orthogonal solution to the current partial order approaches for timed systems and, based on a stubborn set reduction [28,39], we target a general class of timed systems with *urgent behaviour*. In a modular modelling approach for timed systems, urgency is needed to realistically model behaviour in a component that should be unobservable to other components [36]. Examples of such instantaneously evolving behaviours include, among others, cases like behaviour detection in a part of a sensor (whose duration is assumed to be negligible) or handling of release and completion of periodic tasks in a real-time operating system. We observe that focusing on the urgent part of the behaviour of a timed system allows us to exploit the full range of partial order reduction techniques already validated for untimed systems. This leads to an exact and broadly applicable reduction technique, which we shall demonstrate on a series of industrial case studies showing significant space and time reduction. In order to highlight the generality of the approach, we first describe our reduction technique in the setting of timed labelled transition systems. We shall then instantiate it to timed-arc Petri nets and implement and experimentally validate it in the model checker TAPAAL [19].

Let us now briefly introduce the model of timed-arc Peri nets and explain our reduction ideas. In timed-arc Petri nets, each token is associated with a nonnegative integer representing its age and input arcs to transitions contain intervals, restricting the ages of tokens available for transition firing (if an interval is missing, we assume the default interval [0,∞] that accepts all token ages). In Fig. 1a we present a simple monitoring system modelled as a timed-arc Petri net. The system consists of two identical sensors where sensor <sup>i</sup>, <sup>i</sup> ∈ {1, <sup>2</sup>}, is represented by the places b<sup>i</sup> and mi, and the transitions s<sup>i</sup> and ri. Once a token of age 0 is placed into the place bi, the sensor gets started by executing the transition s<sup>i</sup> and moving the token from place b<sup>i</sup> to m<sup>i</sup> where the monitoring process starts. As the place <sup>b</sup><sup>i</sup> has an associated age invariant <sup>≤</sup> 0, meaning that all tokens in b<sup>i</sup> must be of age at most 0, no time delay is allowed and the firing of s<sup>i</sup> becomes urgent. In the monitoring place m<sup>i</sup> we have to delay one time unit before the transition r<sup>i</sup> reporting the reading of the sensor becomes enabled. Due to the age invariant <sup>≤</sup> 1 in the place <sup>m</sup>i, we cannot wait longer than one time unit, after which r<sup>i</sup> becomes also urgent.

The places c1, c<sup>2</sup> and c<sup>3</sup> together with the transitions i1, i<sup>2</sup> and t are used to control the initialization of the sensors. At the execution start, only the transition i<sup>1</sup> is enabled and because it is an urgent transition (denoted by the white circle), no delay is initially possible and i<sup>1</sup> must be fired immediately while removing the token of age 0 from c<sup>1</sup> and placing a new token of age 0 into c2. At the same time, the first sensor gets started as i<sup>1</sup> also places a fresh token of age 0 into b1. Now the control part of the net can decide to fire without any delay the transition i<sup>2</sup> and start the second sensor, or it can delay one unit of time after which <sup>i</sup><sup>2</sup> becomes urgent due to the age invariant <sup>≤</sup> 1 as the token in <sup>c</sup><sup>2</sup> is now of age 1. If i<sup>2</sup> is fired now, it will place a fresh token of age 0 into b2. However, the token that is moved from c<sup>2</sup> to c<sup>3</sup> by the pair of transport arcs with the diamond-shaped arrow tips preserves its age 1, so now we have to wait precisely one more time unit before t becomes enabled. Moreover, before t can be fired, the places m<sup>1</sup> and m<sup>2</sup> must be empty as otherwise the firing of t is disabled due to inhibitor arcs with circle-shaped arrow tips.

In Fig. 1b we represent the reachable state space of the simple monitoring system where markings are represented using the notation like c<sup>3</sup> : 1+b<sup>2</sup> : 2 that stands for one token of age 1 in place c<sup>3</sup> and one token of age 2 in place b2. The dashed boxes represent the markings that can be avoided during the state space exploration when we apply our partial order reduction method for checking if

(a) TAPN model of a simple monitoring system

(b) Reachable state space generated by the net in Figure 1a

**Fig. 1.** Simple monitoring system

the termination transition t can become enabled from the initial marking. We can see that the partial order reduction is applied such that it preserves at least one path to all configurations where our goal is reached (transition t is enabled) and where time is not urgent anymore (i.e. to the configurations that allow the delay of 1 time unit). The basic idea of our approach is to apply the stubborn set reduction on the commutative diamonds where time is not allowed to elapse.

*Related Work.* Our stubborn set reduction is based on the work of Valmari et al. [28,39]. We formulate their stubborn set method in the abstract framework of labelled transition systems with time and add further axioms for time elapsing in order to guarantee preservation of the reachability properties.

For Petri nets, Yoneda and Schlingloff [41] apply a partial order reduction to one-safe time Petri nets, however, as claimed in [38], the method is mainly suitable for small to medium models due to a computational overhead, confirmed also in [29]. The experimental evaluation in [41] shows only one selected example. Sloan and Buy [38] try to improve on the efficiency of the method, at the expense of considering only a rather limited model of *simple time Petri nets* where each transition has a statically assigned duration. Lilius [29] suggests to instead use alternative semantics of timed Petri nets to remove the issues related to the global nature of time, allowing him to apply directly the untimed partial order approaches. However, the semantics is nonstandard and no experiments are reported. Another approach is by Virbitskaite and Pokozy [40], who apply a partial order method on the *region graph* of bounded time Petri nets. Region graphs are in general not an efficient method for state space representation and the method is demonstrated only on a small buffer example with no further experimental validation. Recently, partial order techniques were suggested by Andr´e et al. for parametric time Petri nets [5], however, the approach is working only for safe and acyclic nets. Boucheneb and Barkaoui [12–14] discuss a partial order reduction technique for timed Petri nets based on *contracted state class graphs* and present a few examples on a prototype implementation (the authors do not refer to any publicly available tool). Their method is different from ours as it aims at adding timing constrains to the independence relation, but it does not exploit urgent behaviour. Moreover, the models of time Petri nets and timed-arc Petri nets are, even on the simplest nets, incomparable due to the different way to modelling time.

The fact that we are still lacking a practically applicable method for the time Petri net model is documented by a missing implementation of the technique in leading tools for time Petri net model checking like TINA [9] and Romeo [22]. We are not aware of any work on partial order reduction technique for the class of timed-arc Petri nets that we consider in this paper. This is likely because this class of nets provides even more complex timing behaviour, as we consider unbounded nets where each token carries its timing information (and needs a separate clock to remember the timing), while in time Petri nets timing is associated only to a priory fixed number of transitions in the net.

In the setting of timed automata [3], early work on partial order reduction includes Bengtsson et al. [8] and Minea [32] where they introduce the notion of local as well as global clocks but provide no experimental evaluation. Dams et al. [18] introduce the notion of *covering* in order to generalize dependencies but also here no empirical evaluation is provided. Lugiez, Niebert et al. [30,34] study the notion of *event zones* (capturing time-durations between events) and use it to implement Mazurkiewicz-trace reductions. Salah et al. [37] introduce and implement an exact method based on merging zones resulting from different interleavings. The method achieves performance comparable with the approximate convex-hull abstraction which is by now superseded by the exact LUabstraction [7]. Most recently, Hansen et al. [23] introduce a variant of stubborn sets for reducing an *abstracted zone graph*, thus in general offering overapproximate analysis. Our technique is orthogonal to the other approaches mentioned above; not only is the model different but also the application of our reduction gives exact results and is based on new reduction ideas. Finally, the idea of applying partial order reduction for independent events that happen at the same time appeared also in [15] where the authors, however, use a static method that declares actions as independent only if they do not communicate, do not emit signals and do not access any shared variables. Our realization of the method to the case of timed-arc Petri nets applies a dynamic (on-the-fly) reduction, while executing a detailed timing analysis that allows us to declare more transitions as independent—sometimes even in the case when they share resources.

#### **2 Partial Order Reduction for Timed Systems**

We shall now describe the general idea of our partial order reduction technique (based on stubborn sets [28,39]) in terms of timed transition systems. We consider real-time delays in the rest of this section, as these results are not specific only to discrete time semantics. Let A be a given set of actions such that <sup>A</sup> <sup>∩</sup> <sup>R</sup>≥<sup>0</sup> <sup>=</sup> <sup>∅</sup> where <sup>R</sup>≥<sup>0</sup> stands for the set of nonnegative real numbers.

**Definition 1 (Timed Transition System).** *A* timed transition system *is a tuple* (S, s0, −→) *where* <sup>S</sup> *is a set of states,* <sup>s</sup><sup>0</sup> <sup>∈</sup> <sup>S</sup> *is the initial state, and* −→⊆ <sup>S</sup> <sup>×</sup> (<sup>A</sup> <sup>∪</sup> <sup>R</sup>≥<sup>0</sup>) <sup>×</sup> <sup>S</sup> *is the transition relation.*

If (s, α, s ) ∈−→ we write <sup>s</sup> <sup>α</sup> −→ <sup>s</sup> . We implicitly assume that if s 0 −→ <sup>s</sup> then s = s , i.e. zero time delays do not change the current state. The set of *enabled actions* at a state <sup>s</sup> <sup>∈</sup> <sup>S</sup> is defined as En(s) def <sup>=</sup> {<sup>a</sup> <sup>∈</sup> <sup>A</sup> | ∃s <sup>∈</sup> S. s <sup>a</sup> −→ <sup>s</sup> }. Given a sequence of actions <sup>w</sup> <sup>=</sup> <sup>α</sup>1α2α<sup>3</sup> ...α<sup>n</sup> <sup>∈</sup> (<sup>A</sup> <sup>∪</sup> <sup>R</sup>≥<sup>0</sup>)<sup>∗</sup> we write <sup>s</sup> <sup>w</sup> −→ <sup>s</sup> iff <sup>s</sup> <sup>α</sup><sup>1</sup> −→ ... <sup>α</sup>*<sup>n</sup>* −−→ <sup>s</sup> . If there is a sequence <sup>w</sup> of length <sup>n</sup> such that <sup>s</sup> <sup>w</sup> −→ <sup>s</sup> , we also write <sup>s</sup> −→<sup>n</sup> <sup>s</sup> . Finally, let −→<sup>∗</sup> be the reflexive and transitive closure of the relation −→ such that <sup>s</sup> −→ <sup>s</sup> iff there is <sup>α</sup> <sup>∈</sup> <sup>R</sup>≥<sup>0</sup> <sup>∪</sup> <sup>A</sup> and <sup>s</sup> <sup>α</sup> −→ <sup>s</sup> .

For the rest of this section, we assume a fixed transition system (S, s0, −→) and a set of goal states <sup>G</sup> <sup>⊆</sup> <sup>S</sup>. The *reachability problem*, given a timed transition system (S, s0, −→) and a set of goal states <sup>G</sup>, is to decide whether there is <sup>s</sup> <sup>∈</sup> <sup>G</sup> such that <sup>s</sup><sup>0</sup> −→<sup>∗</sup> <sup>s</sup> .

We now develop the theoretical foundations of stubborn sets for timed transition systems. A state <sup>s</sup> <sup>∈</sup> <sup>S</sup> is *zero time* if time can not elapse at <sup>s</sup>. We denote the zero time property of a state s by the predicate zt(s) and define it as zt(s) iff for all <sup>s</sup> <sup>∈</sup> <sup>S</sup> and all <sup>d</sup> <sup>∈</sup> <sup>R</sup>≥<sup>0</sup> if <sup>s</sup> d −→ <sup>s</sup> then <sup>d</sup> = 0. A *reduction* of a timed transition system is a function St : <sup>S</sup> <sup>→</sup> <sup>2</sup><sup>A</sup>. A reduction defines a reduced transition relation −→St ⊆−→ such that <sup>s</sup> <sup>α</sup> −→St <sup>s</sup> iff <sup>s</sup> <sup>α</sup> −→ <sup>s</sup> and <sup>α</sup> <sup>∈</sup> St(s) <sup>∪</sup> <sup>R</sup>≥<sup>0</sup>. For a given state <sup>s</sup> <sup>∈</sup> <sup>S</sup> we define St(s) def <sup>=</sup> <sup>A</sup> \ St(s) as the set of all actions that are not in St(s).

**Definition 2 (Reachability Conditions).** *A reduction* St *on a timed transition system* (S, s0, −→) *is* reachability preserving *if it satisfies the following four conditions.*

$$\begin{array}{l} (\mathcal{Q}) \quad \forall s \in S. \ \mathsf{zt}(s) \implies \mathsf{En}(s) \subseteq \mathsf{St}(s) \\ (\mathcal{D}) \quad \forall s, s' \in S. \ \forall w \in \overline{\mathsf{St}(s)}^{\*}. \ \mathsf{zt}(s) \wedge s \xrightarrow{w} s' \implies \mathsf{zt}(s') \\ (\mathcal{R}) \ \forall s, s' \in S. \ \forall w \in \overline{\mathsf{St}(s)}^{\*}. \ \mathsf{zt}(s) \wedge s \xrightarrow{w} s' \wedge s \notin G \implies s' \notin G \\ (\mathcal{W}) \ \forall s, s' \in S. \ \forall w \in \overline{\mathsf{St}(s)}^{\*}. \ \forall a \in \mathsf{St}(s). \ \mathsf{zt}(s) \wedge s \xrightarrow{wa} s' \implies s \stackrel{aw}{\longrightarrow} s' \end{array}$$

Condition Z declares that in a state where a delay is possible, all enabled actions become stubborn actions. Condition D guarantees that in order to enable a time delay from a state where delaying is not allowed, a stubborn action must be executed. Similarly, Condition R requires that a stubborn action must be executed before a goal state can be reached from a non-goal state. Finally, Condition W allows us to commute stubborn actions with non-stubborn actions. The following theorem shows that reachability preserving reductions generate pruned transition systems where the reachability of goal states is preserved.

**Theorem 1 (Shortest-Distance Reachability Preservation).** *Let* St *be a reachability preserving reduction satisfying* <sup>Z</sup>*,* <sup>D</sup>*,* <sup>R</sup> *and* <sup>W</sup>*. Let* <sup>s</sup> <sup>∈</sup> <sup>S</sup>*. If* <sup>s</sup> −→<sup>n</sup> <sup>s</sup> *for some* <sup>s</sup> <sup>∈</sup> <sup>G</sup> *then also* <sup>s</sup> −→St <sup>m</sup> <sup>s</sup> *for some* <sup>s</sup> <sup>∈</sup> <sup>G</sup> *where* <sup>m</sup> <sup>≤</sup> <sup>n</sup>*.*

*Proof.* We proceed by induction on n. *Base step.* If n = 0, then s = s and m = n = 0. *Inductive step.* Let s<sup>0</sup> <sup>α</sup><sup>0</sup> −→ <sup>s</sup><sup>1</sup> <sup>α</sup><sup>1</sup> −→ ... <sup>α</sup>*<sup>n</sup>* −−→ <sup>s</sup><sup>n</sup>+1 where <sup>s</sup><sup>0</sup> ∈ <sup>G</sup> and <sup>s</sup><sup>n</sup>+1 <sup>∈</sup> <sup>G</sup>. Without loss of generality we assume that for all <sup>i</sup>, 0 <sup>≤</sup> <sup>i</sup> <sup>≤</sup> <sup>n</sup>, we have <sup>α</sup><sup>i</sup> = 0 (otherwise we can simply skip these 0-delay actions and get a shorter sequence). We have two cases. Case <sup>¬</sup>zt(s0): by condition <sup>Z</sup> we have En(s0) <sup>⊆</sup> St(s0) and by the definition of −→St we have s<sup>0</sup> α0 −→St <sup>s</sup><sup>1</sup> since <sup>α</sup><sup>0</sup> <sup>∈</sup> En(s0) <sup>∪</sup> <sup>R</sup>≥<sup>0</sup>. By the induction hypothesis we have <sup>s</sup><sup>1</sup> −→St <sup>m</sup> <sup>s</sup> with <sup>s</sup> <sup>∈</sup> <sup>G</sup> and <sup>m</sup> <sup>≤</sup> <sup>n</sup> and <sup>m</sup> + 1 <sup>≤</sup> <sup>n</sup> + 1. Case zt(s0): let <sup>w</sup> <sup>=</sup> <sup>α</sup>0α<sup>1</sup> ...α<sup>n</sup> and <sup>α</sup><sup>i</sup> be such that <sup>α</sup><sup>i</sup> <sup>∈</sup> St(s0) and for all k<i holds that <sup>α</sup><sup>k</sup> ∈ St(s0), i.e. <sup>α</sup><sup>i</sup> is the first stubborn action in <sup>w</sup>. Such an <sup>α</sup><sup>i</sup> has to exist otherwise <sup>s</sup><sup>n</sup>+1 ∈ <sup>G</sup> due to condition <sup>R</sup>. Because of condition <sup>D</sup> we get zt(sk) for all <sup>k</sup>, 0 <sup>≤</sup> k<i, otherwise α<sup>i</sup> cannot be the first stubborn action in w. We can split w as w = uαiv with <sup>u</sup> <sup>∈</sup> St(s0) ∗ . Since all states in the path to <sup>s</sup><sup>i</sup> are zero time, by <sup>W</sup> we can swap α<sup>i</sup> as s<sup>0</sup> <sup>α</sup>*<sup>i</sup>* −→ <sup>s</sup> 1 u −→ <sup>s</sup><sup>i</sup> v −→ <sup>s</sup> with <sup>|</sup>uv<sup>|</sup> <sup>=</sup> <sup>n</sup>. Since <sup>α</sup><sup>i</sup> <sup>∈</sup> St(s0) we get <sup>s</sup><sup>0</sup> α*i* −→St s 1 and by the induction hypothesis we have s <sup>1</sup> −→St <sup>m</sup> <sup>s</sup> where <sup>s</sup> <sup>∈</sup> <sup>G</sup>, <sup>m</sup> <sup>≤</sup> <sup>n</sup>, and <sup>m</sup> + 1 <sup>≤</sup> <sup>n</sup> + 1.

#### **3 Timed-Arc Petri Nets**

We shall now define the model of timed-arc Petri nets (as informally described in the introduction) together with a reachability logic and a few technical lemmas needed later on. Let <sup>N</sup><sup>0</sup> <sup>=</sup> <sup>N</sup> ∪ {0} and <sup>N</sup><sup>∞</sup> <sup>0</sup> <sup>=</sup> <sup>N</sup><sup>0</sup> ∪ {∞}. We define the set of *well-formed closed time intervals* as <sup>I</sup> def <sup>=</sup> {[a, b] <sup>|</sup> <sup>a</sup> <sup>∈</sup> <sup>N</sup>0, b <sup>∈</sup> <sup>N</sup><sup>∞</sup> <sup>0</sup> , a <sup>≤</sup> <sup>b</sup>} and its subset <sup>I</sup>inv def <sup>=</sup> {[0, b] <sup>|</sup> <sup>b</sup> <sup>∈</sup> <sup>N</sup><sup>∞</sup> <sup>0</sup> } used in age invariants.

**Definition 3 (Timed-Arc Petri Net).** *A* timed-arc Petri net *(TAPN) is a 9-tuple* N = (P, T, T*urg* ,*IA*, *OA*, *g*,*w*, *Type*,*I*) *where*

	- *if* (p, t) <sup>∈</sup> *IA and* <sup>t</sup> <sup>∈</sup> <sup>T</sup>*urg then g*((p, t)) = [0,∞]*,*
	- *if Type*(z) = *Inhib then* <sup>z</sup> <sup>∈</sup> *IA and g*(z) = [0,∞]*,*
	- *if Type*((p, t)) = *Transport* <sup>j</sup> *for some* (p, t) <sup>∈</sup> *IA then there is exactly one* (t, p ) <sup>∈</sup> *OA such that Type*((t, p )) = *Transport* <sup>j</sup> *,*
	- *if Type*((t, p )) = *Transport* <sup>j</sup> *for some* (t, p ) ∈ *OA then there is exactly one* (p, t) <sup>∈</sup> *IA such that Type*((p, t)) = *Transport* <sup>j</sup> *,*
	- *if Type*((p, t)) = *Transport* <sup>j</sup> <sup>=</sup> *Type*((t, p )) *then w*((p, t)) = *w*((t, p ))*,*

Note that for transport arcs we assume that they come in pairs (for each type *Transport* <sup>j</sup> ) and that their weights match. Also for inhibitor arcs and for input arcs to urgent transitions, we require that the guards are [0,∞].

Before we give the formal semantics of the model, let us fix some notation. Let N = (P, T, T*urg* ,*IA*, *OA*, *g*,*w*, *Type*,*I*) be a TAPN. We denote by •x def <sup>=</sup> {<sup>y</sup> <sup>∈</sup> <sup>P</sup> <sup>∪</sup> <sup>T</sup> <sup>|</sup> (y, x) <sup>∈</sup> *IA* <sup>∪</sup> *OA*, *Type*((y, x)) <sup>=</sup> *Inhib*} the preset of a transition or a place x. Similarly, the postset is defined as x• def <sup>=</sup> {<sup>y</sup> <sup>∈</sup> <sup>P</sup> <sup>∪</sup><sup>T</sup> <sup>|</sup> (x, y) <sup>∈</sup> (*IA*∪*OA*)}. We denote by ◦t def <sup>=</sup> {<sup>p</sup> <sup>∈</sup> <sup>P</sup> <sup>|</sup> (p, t) <sup>∈</sup> *IA* <sup>∧</sup> *Type*((p, t)) = *Inhib*} the inhibitor preset of a transition t. The inhibitor postset of a place p is defined as p◦ def = {<sup>t</sup> <sup>∈</sup> <sup>T</sup> <sup>|</sup> (p, t) <sup>∈</sup> *IA* <sup>∧</sup> *Type*((p, t)) = *Inhib*}. Let <sup>B</sup>(R≥<sup>0</sup>) be the set of all finite multisets over <sup>R</sup>≥<sup>0</sup>. A *marking* <sup>M</sup> on <sup>N</sup> is a function <sup>M</sup> : <sup>P</sup> −→ B(R≥<sup>0</sup>) where for every place <sup>p</sup> <sup>∈</sup> <sup>P</sup> and every token <sup>x</sup> <sup>∈</sup> <sup>M</sup>(p) we have <sup>x</sup> <sup>∈</sup> *<sup>I</sup>*(p), in other words all tokens have to satisfy the age invariants. The set of all markings in a net <sup>N</sup> is denoted by <sup>M</sup>(N).

We write (p, x) to denote a token at a place <sup>p</sup> with the age <sup>x</sup> <sup>∈</sup> <sup>R</sup>≥<sup>0</sup>. Then <sup>M</sup> <sup>=</sup> {(p1, x1),(p2, x2),...,(pn, xn)} is a multiset representing a marking <sup>M</sup> with n tokens of ages x<sup>i</sup> in places p<sup>i</sup> - . We define the size of a marking as <sup>|</sup>M<sup>|</sup> <sup>=</sup> <sup>p</sup>∈<sup>P</sup> <sup>|</sup>M(p)<sup>|</sup> where <sup>|</sup>M(p)<sup>|</sup> is the number of tokens located in the place <sup>p</sup>. A marked TAPN (N,M0) is a TAPN N together with an initial marking M<sup>0</sup> with all tokens of age 0.

**Definition 4 (Enabledness).** *Let* N = (P, T, T*urg* ,*IA*, *OA*, *g*,*w*, *Type*,*I*) *be a TAPN. We say that a transition* <sup>t</sup> <sup>∈</sup> <sup>T</sup> *is* enabled *in a marking* <sup>M</sup> *by the multisets of tokens In* <sup>=</sup> {(p, x<sup>1</sup> <sup>p</sup>),(p, x<sup>2</sup> <sup>p</sup>),...,(p, x*<sup>w</sup>*((p,t)) <sup>p</sup> ) <sup>|</sup> <sup>p</sup> <sup>∈</sup> •t} ⊆ <sup>M</sup> *and Out* <sup>=</sup> {(p , x<sup>1</sup> p- ),(p , x<sup>2</sup> p- ),...,(p , x*<sup>w</sup>*((t,p- )) p- ) <sup>|</sup> <sup>p</sup> <sup>∈</sup> <sup>t</sup> •} *if*

*– for all input arcs except the inhibitor arcs, the tokens from In satisfy the age guards of the arcs, i.e.*

$$\forall p \in \textsuperscript{\bullet} t. \ x\_p^i \in g((p, t)) \ for \ 1 \le i \le w((p, t)).$$

*– for any inhibitor arc pointing from a place* p *to the transition* t*, the number of tokens in* p *is smaller than the weight of the arc, i.e.*

$$\forall (p, t) \in IA. Type((p, t)) = Inhib \Rightarrow |M(p)| < w((p, t))$$

*– for all input arcs and output arcs which constitute a transport arc, the age of the input token must be equal to the age of the output token and satisfy the invariant of the output place, i.e.*

$$\begin{aligned} \forall (p, t) \in IA. &\forall (t, p') \in OA. Type((p, t)) = Type((t, p')) = Transport\_j \\ \Rightarrow \left(x\_p^i = x\_{p'}^i \land x\_{p'}^i \in I(p')\right) &\text{ for } 1 \le i \le w((p, t)) \end{aligned}$$

*– for all normal output arcs, the age of the output token is* 0*, i.e.*

$$\forall (t, p') \in OA.Type((t, p')) = Normal \Rightarrow x\_{p'}^i = 0 \text{ for } 1 \le i \le w((t, p')).$$

A given marked TAPN (N,M0) defines a timed transition system T(N) def = (M(N), M0, −→) where the states are markings and the transitions are as follows.

	- (<sup>x</sup> <sup>+</sup> <sup>d</sup>) <sup>∈</sup> <sup>I</sup>(p) for all <sup>p</sup> <sup>∈</sup> <sup>P</sup> and all <sup>x</sup> <sup>∈</sup> <sup>M</sup>(p), i.e. by delaying <sup>d</sup> time units no token violates any of the age invariants, and
	- if <sup>M</sup> <sup>t</sup> <sup>→</sup> <sup>M</sup> for some <sup>t</sup> <sup>∈</sup> <sup>T</sup>*urg* then <sup>d</sup> = 0, i.e. enabled urgent transitions disallow time passing.

By delaying d time units in M we reach the marking M defined as M (p) = {<sup>x</sup> <sup>+</sup> <sup>d</sup> <sup>|</sup> <sup>x</sup> <sup>∈</sup> <sup>M</sup>(p)} for all <sup>p</sup> <sup>∈</sup> <sup>P</sup>; we write <sup>M</sup> <sup>d</sup> −→ <sup>M</sup> for this delay transition.

Note that the semantics above defines the discrete-time semantics as the delays are restricted to nonnegative integers. It is well known that for timed-arc Petri nets with nonstrict intervals, the marking reachability problem on discrete and continuous time nets coincide [31]. This is, however, not the case for more complex properties like liveness that can be expressed in the CTL logic (for counter examples that can be expressed in CTL see e.g. [25]).

#### **3.1 Reachability Logic and Interesting Sets of Transitions**

We now describe a logic for expressing the properties of markings based on the number of tokens in places and transition enabledness, inspired by the logic


**Table 1.** Interesting transitions of ϕ (assuming M -<sup>|</sup><sup>=</sup> <sup>ϕ</sup>, otherwise <sup>A</sup>*M*(ϕ) = <sup>∅</sup>)

**Table 2.** Increasing and decreasing transitions of expression e


used in the Model Checking Contest (MCC) Property Language [27]. Let N = (P, T, T*urg* ,*IA*, *OA*, *g*,*w*, *Type*,*I*) be a TAPN. The formulae of the logic are given by the abstract syntax:

$$\begin{array}{rcl} \varphi ::= & \mathit{dead} \, \mathit{lock} \, | \, t \mid e\_1 \bowtie \, e\_2 \big| \, \varphi\_1 \land \varphi\_2 \big| \, \varphi\_1 \lor \varphi\_2 \big| \, \neg \varphi \\\ e ::= & c \big| \, p \big| \, e\_1 \oplus e\_2 \big| \, & & \bot & \bot \end{array}$$

where <sup>t</sup> <sup>∈</sup> <sup>T</sup>, ∈ {<, <sup>≤</sup>, <sup>=</sup>, =, >, ≥}, <sup>c</sup> <sup>∈</sup> <sup>Z</sup>, <sup>p</sup> <sup>∈</sup> <sup>P</sup>, and ⊕∈{+, <sup>−</sup>, ∗}. Let <sup>Φ</sup> be the set of all such formulae and let E<sup>N</sup> be the set of arithmetic expressions over the net <sup>N</sup>. The semantics of <sup>ϕ</sup> in a marking <sup>M</sup> ∈ M(N) is given by


assuming a standard semantics for Boolean operators and where the semantics of arithmetic expressions in a marking M is as follows: *eval*M(c) = c, *eval*M(p) = <sup>|</sup>M(p)|, and *eval*M(e<sup>1</sup> <sup>⊕</sup> <sup>e</sup>2) = *eval*M(e1) <sup>⊕</sup> *eval*M(e2).

Let ϕ be a formula. We are interested in the question, whether we can reach from the initial marking some of the goal markings from <sup>G</sup><sup>ϕ</sup> <sup>=</sup> {<sup>M</sup> ∈ M(N) <sup>|</sup> <sup>M</sup> <sup>|</sup><sup>=</sup> <sup>ϕ</sup>}. In order to guide the reduction such that transitions that lead to the goal markings are included in the generated stubborn set, we define the notion of *interesting transitions* for a marking <sup>M</sup> relative to <sup>ϕ</sup>, and we let <sup>A</sup>M(ϕ) <sup>⊆</sup> <sup>T</sup> denote the set of interesting transitions. Formally, we shall require that whenever M <sup>w</sup> −→ <sup>M</sup> via a sequence of transitions <sup>w</sup> <sup>=</sup> <sup>t</sup>1t<sup>2</sup> ...t<sup>n</sup> <sup>∈</sup> <sup>T</sup> <sup>∗</sup> where <sup>M</sup> ∈ <sup>G</sup><sup>ϕ</sup> and <sup>M</sup> <sup>∈</sup> <sup>G</sup>ϕ, then there must exist <sup>i</sup>, 1 <sup>≤</sup> <sup>i</sup> <sup>≤</sup> <sup>n</sup>, such that <sup>t</sup><sup>i</sup> <sup>∈</sup> <sup>A</sup>M(ϕ).

Table 1 gives a possible definition of AM(ϕ). Let us remark that the definition is at several places nondeterministic, allowing for a variety of sets of interesting transitions. Table <sup>1</sup> uses the functions *incr<sup>M</sup>* : <sup>E</sup><sup>N</sup> <sup>→</sup> <sup>2</sup><sup>T</sup> and *decr<sup>M</sup>* : <sup>E</sup><sup>N</sup> <sup>→</sup> <sup>2</sup><sup>T</sup> defined in Table 2. These functions take as input an expression e, and return all transitions that can possibly, when fired, increase resp. decrease the evaluation of e. The following lemma formally states the required property of the functions *incr<sup>M</sup>* and *decr<sup>M</sup>* .

**Lemma 1.** *Let* <sup>N</sup> = (P, T, T*urg* ,*IA*, *OA*, *<sup>g</sup>*,*w*, *Type*,*I*) *be a TAPN and* <sup>M</sup> <sup>∈</sup> <sup>M</sup>(N) *a marking. Let* <sup>e</sup> <sup>∈</sup> <sup>E</sup><sup>N</sup> *and let* <sup>M</sup> <sup>w</sup> −→ <sup>M</sup> *where* <sup>w</sup> <sup>=</sup> <sup>t</sup>1t<sup>2</sup> ...t<sup>n</sup> <sup>∈</sup> <sup>T</sup> <sup>∗</sup>*.*


We finish this section with the main technical lemma, showing that at least one interesting transition must be fired before we can reach a marking satisfying a given reachability formula.

**Lemma 2.** *Let* <sup>N</sup> = (P, T, T*urg* ,*IA*, *OA*, *<sup>g</sup>*,*w*, *Type*,*I*) *be a TAPN, let* <sup>M</sup> <sup>∈</sup> <sup>M</sup>(N) *be its marking and let* <sup>ϕ</sup> <sup>∈</sup> <sup>Φ</sup> *be a given formula. If* <sup>M</sup> |<sup>=</sup> <sup>ϕ</sup> *and* <sup>M</sup> <sup>w</sup> −→ <sup>M</sup> *where* <sup>w</sup> <sup>∈</sup> <sup>A</sup>M(ϕ) ∗ *then* <sup>M</sup> |<sup>=</sup> <sup>ϕ</sup>*.*

#### **4 Partial Order Reductions for TAPN**

We are now ready to state the main theorem that provides sufficient syntaxdriven conditions for a reduction in order to guarantee preservation of reachability. Let <sup>N</sup> = (P, T, T*urg* ,*IA*, *OA*, *<sup>g</sup>*,*w*, *Type*,*I*) be a TAPN, let <sup>M</sup> ∈ M(N) be a marking of <sup>N</sup>, and let <sup>ϕ</sup> <sup>∈</sup> <sup>Φ</sup> be a formula. We recall that <sup>A</sup>M(ϕ) is the set of interesting transitions as defined earlier.

**Theorem 2 (Reachability Preserving Closure).** *Let* St *be a reduction such that for all* <sup>M</sup> ∈ M(N) *it satisfies the following conditions.*

	- *(a) there is* <sup>t</sup> <sup>∈</sup> <sup>T</sup>*urg* <sup>∩</sup> En(M) <sup>∩</sup> St(M) *where* •(◦t) <sup>⊆</sup> St(M)*, or*
	- *(b) there is* <sup>p</sup> <sup>∈</sup> <sup>P</sup> *where I*(p)=[a, b] *and* <sup>b</sup> <sup>∈</sup> <sup>M</sup>(p) *such that* <sup>t</sup> <sup>∈</sup> St(M) *for every* <sup>t</sup> <sup>∈</sup> <sup>p</sup>• *where* <sup>b</sup> <sup>∈</sup> *<sup>g</sup>*((p, t))*.*

2 5 *p* inv:≤ 5 *t*<sup>1</sup> *t*<sup>2</sup> [2*,* 4] [5*,* 5]

(a) Transitions *t*<sup>1</sup> and *t*<sup>2</sup> can disable resp. inhibit the urgent transition *t*

(b) Transition *t*<sup>2</sup> can remove the token of age 5 from *p*

#### **Fig. 2.** Cases for Condition 3

*4 For all* <sup>t</sup> <sup>∈</sup> St(M) \ En(M) *either (a) there is* <sup>p</sup> <sup>∈</sup> •<sup>t</sup> *such that* |{<sup>x</sup> <sup>∈</sup> <sup>M</sup>(p) <sup>|</sup> <sup>x</sup> <sup>∈</sup> *<sup>g</sup>*((p, t))}| < w((p, t)) *and –* t <sup>∈</sup> St(M) *for all* <sup>t</sup> <sup>∈</sup> •<sup>p</sup> *where there is* <sup>p</sup> <sup>∈</sup> •<sup>t</sup> *with Type*((t , p)) = *Type*((p , t )) = *Transport* <sup>j</sup> *and where g*((p , t )) <sup>∩</sup> *<sup>g</sup>*((p, t)) <sup>=</sup> <sup>∅</sup>*, and – if* <sup>0</sup> <sup>∈</sup> *<sup>g</sup>*((p, t)) *then also* •<sup>p</sup> <sup>⊆</sup> St(M)*, or (b) there is* <sup>p</sup> <sup>∈</sup> ◦<sup>t</sup> *where* <sup>|</sup>M(p)| ≥ <sup>w</sup>((p, t)) *such that –* t <sup>∈</sup> St(M) *for all* <sup>t</sup> <sup>∈</sup> <sup>p</sup>• *where* <sup>M</sup>(p) <sup>∩</sup> *<sup>g</sup>*((p, t )) = ∅*. 5 For all* <sup>t</sup> <sup>∈</sup> St(M) <sup>∩</sup> En(M) *we have (a)* t <sup>∈</sup> St(M) *for every* <sup>t</sup> <sup>∈</sup> <sup>p</sup>• *where* <sup>p</sup> <sup>∈</sup> •<sup>t</sup> *and g*((p, t)) <sup>∩</sup> *<sup>g</sup>*((p, t )) = ∅*, and (b)* (t •)◦ <sup>⊆</sup> St(M)*.*

*Then* St *satisfies* Z*,* D*,* R*, and* W*.*

Let us now briefly discuss the conditions of Theorem 2. Clearly, Condition 1 ensures that if time can elapse, we include all enabled transitions into the stubborn set and Condition 2 guarantees that all interesting transitions (those that can potentially make the reachability proposition true) are included as well.

Condition 3 makes sure that if time elapsing is disabled then any transition that can possibly enable time elapsing will be added to the stubborn set. There are two situations how time progress can be disabled. Either, there is an urgent enabled transition, like the transition t in Fig. 2a. Since t<sup>2</sup> can add a token to p<sup>2</sup> and by that inhibit t, Condition 3a makes sure that t<sup>2</sup> is added into the stubborn set in order to satisfy <sup>D</sup>. As <sup>t</sup><sup>1</sup> can remove the token of age 3 from <sup>p</sup><sup>1</sup> and hence disable t, we must add t<sup>1</sup> to the stubborn set too (guaranteed by Condition 5a). The other situation when time gets stopped is when a place with an age invariant contains a token that disallows time passing, like in Fig. 2b where time is disabled because the place p has a token of age 5, which is the maximum possible age of tokens in p due to the age invariant. Since t<sup>2</sup> can remove the token of age 5 from p, we include it to the stubborn set due to Condition 3b. On the other hand t<sup>1</sup> does not have to be included in the stubborn set as its firing cannot remove the token of age 5 from p.

Condition 4 makes sure that an disabled stubborn transition can never be enabled by a non-stubborn transition. There are two reasons why a transition is disabled. Either, as in Fig. 3a where t is disabled, there is an insufficient number of tokens of appropriate age to fire the transition. In this case, Condition 4a

(a) Transition *t*<sup>1</sup> can transport wellaged tokens into *p* and enable *t*

(b) Transition *t*<sup>1</sup> can enable *t* by removing tokens from *p*

**Fig. 3.** Cases for Condition 4

(a) Stubborn transition *t* can disable both *t*<sup>2</sup> and *t*<sup>3</sup>

**Fig. 4.** Cases for Condition 5

makes sure that transitions that can add tokens of a suitable age via transport arcs are included in the stubborn set. This is the case for the transition t<sup>1</sup> in our example, as [2, 5] has a nonempty intersection with [4, 6]. On the other hand, t<sup>3</sup> does not have to be added. As the transition t<sup>2</sup> only adds fresh tokens of age 0 to p via normal arcs, there is no need to add t<sup>2</sup> into the stubborn set either. The other reason for a transition to be disabled is due to inhibitor arcs, as shown on the transition t in Fig. 3b. Condition 4b makes sure that t<sup>1</sup> is added to the stubborn set, as it can enable t (the interval [6, 8] has a nonempty intersection with the tokens of age 6 and 7 in the place p). As this is not the case for t2, this transition can be left out from the stubborn set.

Finally, Condition 5 guarantees that enabled stubborn transitions can never disable any non-stubborn transitions. For an illustration, take a look at Fig. 4a and assume that t is an enabled stubborn transition. Firing of t can remove the token of age 4 from p and disable t2, hence t<sup>2</sup> must become stubborn by Condition 5a in order to satisfy <sup>W</sup>. On the other hand, the intervals [6, 8] and [2, 5] have empty intersection, so there is no need to declare t<sup>1</sup> as a stubborn transition. Moreover, firing of t can also disable the transition t<sup>3</sup> due to the inhibitor arc, so we must add t<sup>3</sup> to the stubborn set by Condition 5b.

The conditions of Theorem 2 can be turned into an iterative saturation algorithm for the construction of stubborn sets as shown in Algorithm 1. When running this algorithm for the net in our running example, we can reduce the state space exploration for fireability of the transition t as depicted in Fig. 1b. Our last theorem states that the algorithm returns stubborn subsets of enabled

**Algorithm 1.** Construction of a reachability preserving stubborn set

**input :** N = (P, T, T*urg* ,*IA*, *OA*, *g*, *w*, *Type*,*I* ), M ∈ M(N), ϕ ∈ Φ **output :** St(M) ∩ En(M) **<sup>1</sup> if** ¬zt(M) **then 2 return** En(M)*;* **<sup>3</sup>** <sup>X</sup> := <sup>∅</sup>; <sup>Y</sup> := <sup>A</sup>*M*(ϕ); **<sup>4</sup> if** T*urg* ∩ En(M) -= ∅ **then <sup>5</sup>** pick any t ∈ T*urg* ∩ En(M); **<sup>6</sup> if** t /∈ Y **then <sup>7</sup>** Y := Y ∪ {t}; **<sup>8</sup>** Y := Y ∪ •( ◦t); **9 else <sup>10</sup>** pick any p ∈ P where *I* (p)=[a, b] and b ∈ M(p) **<sup>11</sup> forall** t ∈ p• **do <sup>12</sup> if** b ∈ *g*((p, t)) **then <sup>13</sup>** Y := Y ∪ {t}; **<sup>14</sup> while** Y -= ∅ **do <sup>15</sup>** pick any t ∈ Y ; **<sup>16</sup> if** t /∈ En(M) **then <sup>17</sup> if** ∃p ∈ •t. |{x ∈ M(p) | x ∈ *g*((p, t))}| < w((p, t)) **then 18** pick any such p; **19 forall** t ∈ •p \ X **do <sup>20</sup> forall** p ∈ •t **do 21 if** *Type*((t , p)) = *Type*((p , t )) = *Transport j* <sup>∧</sup> *<sup>g</sup>*((p , t )) ∩ *g*((p, t)) -= ∅ **then <sup>22</sup>** Y := Y ∪ {t }; **<sup>23</sup> if** 0 ∈ *g*((p, t)) **then <sup>24</sup>** Y := Y ∪ ( •p \ X); **25 else <sup>26</sup>** pick any p ∈ ◦t s.t. |M(p)| ≥ w((p, t)); **27 forall** t ∈ p• \ X **do <sup>28</sup> if** M(p) ∩ *g*((p, t )) -= ∅ **then <sup>29</sup>** Y := Y ∪ {t }; **30 else <sup>31</sup> forall** p ∈ •t **do <sup>32</sup>** Y := Y ∪ ({t ∈ p•|*g*((p, t)) ∩ *g*((p, t )) -= ∅} \ X); **<sup>33</sup>** Y := Y ∪ ((t •) ◦ \ X); **<sup>34</sup>** Y := Y \ {t}; **<sup>35</sup>** X := X ∪ {t}; **<sup>36</sup> return** X ∩ En(M)*;*

transitions that satisfy the four conditions of Theorem 1 and hence we preserve the reachability property as well as the minimum path to some reachable goal.

**Theorem 3.** *Algorithm <sup>1</sup> terminates and returns* St(M) <sup>∩</sup> En(M) *for some reduction* St *that satisfies* Z*,* D*,* R*, and* W*.*

#### **5 Implementation and Experiments**

We implemented our partial order method in C++ and integrated it within the model checker TAPAAL [19] and its discrete time engine verifydtapn [4,11]. We evaluate our partial order reduction on a wide range of case studies.

*PatientMonitoring.* The patient monitoring system [17] models a medical system that through sensors periodically scans patient's vital functions, making sure that abnormal situations are detected and reported within given deadlines. The timed-arc Petri net model was described in [17] for two sensors monitoring patient's pulse rate and oxygen saturation level. We scale the case study by adding additional sensors. *BloodTransfusion.* This case study models a larger blood transfusion workflow [16], the benchmarking case study of the little-JIL language. The timed-arc Petri net model was described in [10] and we verify that the workflow is free of deadlocks (unless all sub-workflows correctly terminate). The problem is scaled by the number of patients receiving a blood transfusion. *FireAlarm.* This case study uses a modified (due to trade secrets) fire alarm system owned by a German company [20,21]. It models a four-channel roundrobin frequency-hopping transmission scheduling in order to ensure a reliable communication between a number of wireless sensors (by which the case study is scaled) and a central control unit. The protocol is based on time-division multiple access (TDMA) channel access and we verify that for a given frequencyjammer, it takes never more than three cycles before a fire alarm is communicated to the central unit. *BAwPC.* Business Activity with Participant Completion (BAwPC) is a web-service coordination protocol from WS-BA specification [33] that ensures a consistent agreement on the outcome of long-running distributed applications. In [26] it was shown that the protocol is flawed and a correct, enhanced variant was suggested. We model check this enhanced protocol and scale it by the capacity of the communication buffer. *Fischer.* Here we consider a classical Fischer's protocol for ensuring mutual exclusion for a number of timed processes. The timed-arc Petri net model is taken from [2] and it is scaled by the number of processes. *LynchShavit.* This is another timed-based mutual exclusion algorithm by Lynch and Shavit, with the timed-arc Petri net model taken from [1] and scaled by the number of processes. *MPEG2.* This case study describes the workflow of the MPEG-2 video encoding algorithm run on a multicore processor (the timed-arc Petri net model was published in [35]) and we verify the maximum duration of the workflow. The model is scaled by the number of B frames in the IB<sup>n</sup>P frame sequence. *AlternatingBit.* This is a classical case study of alternating bit protocol, based on the timed-arc Petri net model given in [24]. The purpose of the protocol is to ensure a safe communication between a sender and a receiver over an unreliable medium. Messages are time-stamped in order to compensate


**Table 3.** Experiments with and without partial order reduction (POR)

(via retransmission) for the possibility of losing messages. The case study is scaled by the maximum number of messages in transfer.

All experiments were run on AMD Opteron 6376 Processors with 500 GB memory. In Table 3 we compare the time to verify a model without (NORMAL) and with (POR) partial order reduction, the number of explored markings (in thousands) and the percentage of time and memory reduction. We can observe clear benefits of our technique on PatientMonitoring, BloodTransfusion and Fire-Alarm where we are both exponentially faster and explore only a fraction of all reachable markings. For example in FireAlarm, we are able to verify its correctness for all 125 sensors, as it is required by the German company [21]. This would be clearly unfeasible without the use of partial order reduction.

In BAwPC, we can notice that for the smallest instances, there is some computation overhead from computing the stubborn sets, however, it clearly pays off for the larger instances where the percentages of reduced state space are closely followed by the percentages of the verification times and in fact improve with the larger instances. Fischer and LynchShavit case studies demonstrate that even moderate reductions of the state space imply considerable reduction in the running time and computing the stubborn sets is well worth the extra effort.

MPEG2 is an example of a model that allows only negligible reduction of the state space size, and where we observe an actual slowdown in the running time due to the computation of the stubborn sets. Nevertheless, the overhead stays constant in the range of about 15%, even for increasing instance sizes. Finally, AlternatingBit protocol does not allow for any reduction of the state space (even though it contains age invariants) but the overhead in the running time is negligible.

We observed similar performance of our technique also for the cases where the reachability property does not hold and a counter example can be generated.

#### **6 Conclusion**

We suggested a simple, yet powerful and application-ready partial order reduction for timed systems. The reduction comes into effect as soon as the timed system enters an urgent configuration where time cannot elapse until a nonempty sequence of transitions gets executed. The method is implemented and fully integrated, including GUI support, into the open-source tool TAPAAL. We demonstrated its practical applicability on several case studies and conclude that computing the stubborn sets causes only a minimal overhead while providing large benefits for reducing the state space in numerous models. The method is not specific to stubborn reduction technique only and it preserves the shortest execution sequences. Moreover, once the time gets urgent, other classical (untimed) partial order approaches should be applicable too. Our method was instantiated to (unbounded) timed-arc Petri nets with discrete time semantics, however, we claim that the technique allows for general application to other modelling formalisms like timed automata and timed Petri nets, as well as an extension to continuous time. We are currently working on adapting the theory and providing an efficient implementation for UPPAAL-style timed automata with continuous time semantics.

**Acknowledgements.** We thank Mads Johannsen for his help with the GUI support for partial order reduction. The work was funded by the center IDEA4CPS, Innovation Fund Denmark center DiCyPS and ERC Advanced Grant LASSO. The last author is partially affiliated with FI MU in Brno.

#### **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

### **A Counting Semantics for Monitoring LTL Specifications over Finite Traces**

Ezio Bartocci1(B) , Roderick Bloem<sup>2</sup>, Dejan Nickovic<sup>3</sup>, and Franz Roeck<sup>2</sup>

> TU Wien, Vienna, Austria ezio.bartocci@tuwien.ac.at Graz University of Technology, Graz, Austria Austrian Institute of Technology GmbH, Vienna, Austria

**Abstract.** We consider the problem of monitoring a Linear Time Logic (LTL) specification that is defined on infinite paths, over finite traces. For example, we may need to draw a verdict on whether the system satisfies or violates the property "p holds infinitely often." The problem is that there is always a continuation of a finite trace that satisfies the property and a different continuation that violates it.

We propose a two-step approach to address this problem. First, we introduce a counting semantics that computes the number of steps to witness the satisfaction or violation of a formula for each position in the trace. Second, we use this information to make a prediction on inconclusive suffixes. In particular, we consider a *good* suffix to be one that is shorter than the longest witness for a satisfaction, and a *bad* suffix to be shorter than or equal to the longest witness for a violation. Based on this assumption, we provide a verdict assessing whether a continuation of the execution on the same system will presumably satisfy or violate the property.

#### **1 Introduction**

Alice is a verification engineer and she is presented with a new exciting and complex design. The requirements document coming with the design already incorporates functional requirements formalized in Linear Temporal Logic (LTL) [13]. The design contains features that are very challenging for exhaustive verification and her favorite model checking tool does not terminate in reasonable time.

This work was partially supported by the European Union (IMMORTAL project, grant no. 644905), the Austrian FWF (National Research Network RiSE/SHiNE S11405-N23 and S11406-N23), the SeCludE project (funded by UnivPM) and the ENABLE-S3 project that has received funding from the ECSEL Joint Undertaking under Grant Agreement no. 692455. This Joint Undertaking receives support from the European Unions HORIZON 2020 research and innovation programme and Austria, Denmark, Germany, Finland, Czech Republic, Italy, Spain, Portugal, Poland, Ireland, Belgium, France, Netherlands, United Kingdom, Slovakia, Norway.

c The Author(s) 2018

H. Chockler and G. Weissenbacher (Eds.): CAV 2018, LNCS 10981, pp. 547–564, 2018. https://doi.org/10.1007/978-3-319-96145-3\_29

*Runtime Verification.* Alice decides to tackle this problem using runtime verification (RV) [3], a light, yet rigorous verification method. RV drops the exhaustiveness of model checking and analyzes individual traces generated by the system. Thus, it scales much better to the industrial-size designs. RV enables automatic generation of monitors from formalized requirements and thus provides a systematic way to check if the system traces satisfy (violate) the specification.

*Motivating Example.* In particular, Alice considers the following specification:

ψ ≡ G(request → F grant)

This LTL formula specifies that every request coming from the environment must be granted by the design in some finite (but unbounded) future. Alice realizes that she is trying to check a *liveness* property over a set of *finite* traces. She looks closer at the executions and identifies the two interesting examples trace τ<sup>1</sup> and trace τ2, depicted in Table 1.

The monitoring tool reports that both τ<sup>1</sup> and τ<sup>2</sup> presumably violate the unbounded response property. This verdict is against Alice's intuition. The evaluation of trace τ<sup>1</sup> seems right to her – the request at Cycle 1 is followed by a grant at Cycle 3, however the request at Cycle 4 is never granted during that execution. There are good reasons to suspect a bug in the design. Then she looks

**Table 1.** Unbounded response property example.


We use "−" instead of "⊥" to improve the trace readability.

at τ<sup>2</sup> and observes that after every request the grant is given exactly after 2 cycles. It is true that the last request at Cycle 7 is not followed by a grant, but this seems to happen because the execution ends at that cycle – the past trace observations give reason to think that this request would be followed by a grant in cycle 9 if the execution was continued. Thus, Alice is not satisfied by the second verdict.

Alice looks closer at the way that the LTL property is evaluated over finite traces. She finds out that temporal operators are given *strength* – *eventually* and *until* are declared as *strong* operators, while *always* and *weak until* are defined to be *weak* [9]. A strong temporal operator requires all outstanding obligations to be met before the end of the trace. In contrast, a weak temporal operator must not witness any outstanding obligation violation before the end of the trace. Under this interpretation, both τ<sup>1</sup> and τ<sup>2</sup> violate the unbounded response property.

Alice explores another popular approach to evaluate future temporal properties over finite traces – the 3-valued semantics for LTL [4]. In this setting, the Boolean set of verdicts is extended with a third unknown (or maybe) value. A finite trace satisfies (violates) the 3-valued LTL formula if and only if all the infinite extensions of the trace satisfy (violate) the same LTL formula under its classical interpretation. In all other cases, we say that the satisfaction of the formula by the trace is unknown. Alice applies the 3-valued interpretation of LTL on the traces τ<sup>1</sup> and τ<sup>2</sup> to evaluate the unbounded response property. In both situations, she ends up with the unknown verdict. Once again, this is not what she expects and it does not meet her intuition about the satisfaction of the formula by the observed traces.

Alice desires a semantics that evaluates LTL properties on finite traces by taking previous observations into account.

*Contributions.* In this paper, we study the problem of LTL evaluation over finite traces encountered by Alice and propose a solution. We introduce a new counting semantics for LTL that takes into account the intuition illustrated by the example from Table 1. This semantics computes for every position of a trace two values – the distances to the nearest satisfaction and violation of the co-safety, respectively safety, part of the specification. We use this quantitative information to make *predictions* about the (infinite) suffixes of the finite observations. We infer from these values the maximum time that we expect for a future obligation to be fulfilled. We compare it to the value that we have for an open obligation at the end of the trace. If the latter is greater (smaller) than the expected maximum value, we have a good indication of a *presumed violation (satisfaction)* that we report to the user. In particular, our approach will indicate that τ<sup>1</sup> is likely to violate the specification and should be further inspected. In contrast, it will evaluate that τ<sup>2</sup> most likely satisfies the unbounded response property.

*Organization of the Paper.* The rest of the paper is organized as follows. We discuss the related work in Sect. 2 and we provide the preliminaries in Sect. 3. In Sect. 4 we present our new counting semantics for LTL and we show how to make *predictions* about (infinite) suffixes of the finite observations. Section 5 shows the application of our approach to some examples. Finally in Sect. 6 we draw our conclusions.

#### **2 Related Work**

The finitary interpretation of LTL was first considered in [11], where the authors propose to enrich the logic with the *weak* next operator that is dual to the (strong) next operator defined on infinite traces. While the strong next requires the existence of a next state, the weak next trivially evaluates to true at the end of the trace. In [9], the authors propose a more semantic approach with *weak* and *strong* views for evaluating future obligations at the end of the trace. In essence the empty word satisfies (violates) every formula according to the weak (strong) view. These two approaches result in the violation of the specification ψ by both traces τ<sup>1</sup> and τ2.

The authors in [4] propose a 3-valued finitary LTL interpretation of LTL, in which the set {true, false} of verdicts is extended with a third inconclusive verdict. According to the 3-valued LTL, a finite trace satisfies (violates) a specification iff all its infinite extensions satisfy (violate) the same property under the classical LTL interpretation. Otherwise, it evaluates to inconclusive. The main disadvantage of the 3-valued semantics is the dominance of the inconclusive verdict in the evaluation of many interesting LTL formulas. In fact, both τ<sup>1</sup> and τ<sup>2</sup> from Table 1 evaluate to inconclusive against the unbounded response specification ψ.

In [5], the authors combine the weak and strong operators with the 3-valued semantics to refine the inconclusive with {presumably true, presumably false}. The strength of the remaining future obligation dictates the presumable verdict. The authors in [12] propose a finitary semantics for each of the LTL (safety, liveness, persistence and recurrence) hierarchy classes that asymptotically converges to the infinite traces semantics of the logic. In these two works, the specification ψ also evaluates to the same verdict for both the traces τ<sup>1</sup> and τ2.

To summarize, none of the related work handles the unbounded response example from Table 1 in a satisfactory manner. This is due to the fact that these approaches decide about the verdict based on the specification and its remaining future obligations at the end of the trace. In contrast, we propose an approach in which the past observations within the trace are used to predict the future and derive the appropriate verdict. In particular, the application of our semantics for the evaluation of ψ over τ<sup>1</sup> and τ<sup>2</sup> results in presumably true and presumably false verdicts.

In [17], the authors propose another predictive semantics for LTL. In essence, this work assumes that at every point in time the monitor is able to precisely predict a segment of the trace that it has not observed yet and produce its outcome accordingly. In order to ensure such predictive power, this approach requires a white-box setting in which instrumentation and some form of static analysis of the systems are needed in order to foresee in advance the upcoming observations. This is in contrast to our work, in which the monitor remains a passive participant and predicts its verdict only based on the past observations.

In a different research thread [15], the authors introduce the notion of *monitorable* specifications that can be positively or negatively determined by a finite trace. The monitorability of LTL is further studied in [6,14]. This classification of specifications is orthogonal to our work. We focus on providing a sensible evaluation to all LTL properties, including the non-monitorable ones (e.g., GF p).

We also mention the recent work on statistical model checking for LTL [8]. In this work, the authors assume a gray-box setting, where the system-under-test (SUT) is a Markov chain with the known minimum transition probability. This is in contrast to our work, in which we passively observe existing finite traces generated by the SUT, i.e., we have a blackbox setting.

In [1], the authors propose extending LTL with a discounting operator and study the properties of the augmented logic. The LTL specification formalism is extended with path-accumulation assertions in [7]. These LTL extensions are motivated by the need for a more quantitative and refined analysis of the systems. In our work, the motivation for the counting semantics is quite different. We use the quantitative information that we collect during the execution of the trace to predict the future behavior of the system and thus improve the quality of the monitoring verdict.

#### **3 Preliminaries**

We first introduce *traces* and Linear Temporal Logic (LTL) that we interpret over 3-valued semantics.

**Definition 1 (Trace).** *Let* P *a finite set of* propositions *and let* Π = 2<sup>P</sup> *. A (finite or infinite)* trace <sup>π</sup> *is a sequence* <sup>π</sup>1, π2,... <sup>∈</sup> <sup>Π</sup><sup>∗</sup> <sup>∪</sup> <sup>Π</sup><sup>ω</sup> *. We denote by* <sup>|</sup>π| ∈ <sup>N</sup> ∪ {∞} *the* length *of* <sup>π</sup>*. We denote by* <sup>π</sup> · <sup>π</sup> *the concatenation of* <sup>π</sup> <sup>∈</sup> <sup>Π</sup><sup>∗</sup> *and* <sup>π</sup> <sup>∈</sup> <sup>Π</sup><sup>∗</sup> <sup>∪</sup> <sup>Π</sup><sup>ω</sup>*.*

**Definition 2 (Linear Temporal Logic).** *In this paper, we consider linear temporal logic (LTL) and we define its syntax by the grammar:*

$$
\phi := p \mid \neg \phi \mid \phi\_1 \lor \phi\_2 \mid \mathsf{X}\phi \mid \phi\_1 \mathsf{U}\phi\_2,
$$

*where* p ∈ P*. We denote by* Φ *the set of all LTL formulas.*

From the basic definition we can derive other standard Boolean and temporal operators as follows:

$$\top = p \lor \neg p,\ \bot = \neg \top,\ \phi \land \psi = \neg(\neg \phi \lor \neg \psi),\ \mathsf{F}\phi = \top \mathsf{U}\ \phi,\ \mathsf{G}\ \phi = \neg \mathsf{F}\neg \phi$$

Let <sup>π</sup> <sup>∈</sup> <sup>Π</sup><sup>ω</sup> be an infinite trace and <sup>φ</sup> an LTL formula. The satisfaction relation (π, i) |= φ is defined inductively as follows

$$\begin{array}{lll} (\pi, i) \vdash p & \text{iff } p \in \pi\_i, \\ (\pi, i) \vdash \neg \phi & \text{iff } (\pi, i) \models \phi, \\ (\pi, i) \vdash \phi\_1 \lor \phi\_2 \text{ iff } (\pi, i) \vdash \phi\_1 \text{ or } (\pi, i) \vdash \phi\_2, \\ (\pi, i) \vdash \mathsf{X}\phi & \text{iff } (\pi, i+1) \vdash \phi, \\ (\pi, i) \vdash \phi\_1 \mathsf{U}\phi\_2 \text{ iff } \exists j \ge i \text{ s.t. } (\pi, j) \vdash \phi\_2 \text{ and } \forall i \le k < j, (\pi, k) \vdash \phi\_1. \end{array}$$

We now recall the 3-valued semantics from [4]. We denote by [π |=<sup>3</sup> φ] the evaluation of φ with respect to the trace π ∈ Π<sup>∗</sup> that yields a value in {, ⊥, ?}.

$$\left[\pi \mid =\_3 \phi\right] = \begin{cases} \top & \forall \pi' \in \varPi^{\omega}, \pi \cdot \pi' \mid = \phi, \\ \bot & \forall \pi' \in \varPi^{\omega}, \pi \cdot \pi' \nmid \phi, \\ ? & \text{otherwise.} \end{cases}$$

We now restrict LTL to a fragment without explicit and ⊥ symbols and with the explicit F operator that we add to the syntax. We provide an alternative 3-valued semantics for this fragment, denoted by <sup>μ</sup>π(φ, i) where <sup>i</sup> <sup>∈</sup> <sup>N</sup>><sup>0</sup> indicates a position in or outside the trace. We assume the order ⊥ <? < , and extend the Boolean operations to the 3-valued domain with the rules ¬3 = ⊥, ¬3⊥ = and ¬3? =? and φ<sup>1</sup> ∨<sup>3</sup> φ<sup>2</sup> = max(φ1, φ2). We define the semantics inductively as follows:

$$\begin{array}{lcl} \mu\_{\pi}(p,i) &=& \begin{cases} \top & \text{if } i \le |\pi| \text{ and } p \in \pi\_{i}, \\ \bot & \text{else if } i \le |\pi| \text{ and } p \notin \pi\_{i}, \\ ? & \text{otherwise,} \end{cases} \\ \mu\_{\pi}(\neg\phi,i) &=& \neg3\mu\_{\pi}(\phi,i), \\ \mu\_{\pi}(\phi\_{1}\lor\phi\_{2},i) = \mu\_{\pi}(\phi\_{1},i)\lor\_{3}\mu\_{\pi}(\phi\_{2},i), \\ \mu\_{\pi}(\mathsf{X}\phi,i) &=& \mu\_{\pi}(\phi,i+1), \\ \mu\_{\pi}(\mathsf{F}\phi,i) &=& \begin{cases} \mu\_{\pi}(\phi,i)\lor\_{3}\mu\_{\pi}(\mathsf{XF}\phi,i) & \text{if } i \le |\pi|, \\ \mu\_{\pi}(\phi,i) & \text{if } i > |\pi|, \\ \mu\_{\pi}(\phi\_{2},i) & \text{if } i > |\pi|, \\ \mu\_{\pi}(\phi\_{2},i) & \text{if } i > |\pi|, \end{cases} \\ \mu\_{\pi}(\phi\_{2},i) &=& \begin{cases} \mu\_{\pi}(\phi\_{2},i) & \text{if } i \ge |\pi|, \\ \mu\_{\pi}(\phi\_{2},i) & \text{if } i > |\pi|. \end{cases} \end{array}$$

We note that the adapted semantics allows evaluating a finite trace in polynomial time, in contrast to [π |=<sup>3</sup> φ], which requires a PSPACE-complete algorithm. This improvement in complexity comes at a price – the adapted semantics cannot semantically characterize tautologies and contradiction. We have for example that μπ(p∨ ¬p, 1) for the empty word evaluates to ?, despite the fact that p∨ ¬p is semantically equivalent to . The novel semantics that we introduce in the following sections make the same tradeoff.

In the following lemma, we relate the two three-valued semantics.

**Lemma 3.** *Given an LTL formula and a trace* π ∈ Π∗*,* |π| = 0*, we have that*

$$\begin{array}{l} \mu\_{\pi}(\phi, 1) = \top \Rightarrow [\pi \vdash\_{3} \phi] = \top, \\\mu\_{\pi}(\phi, 1) = \bot \Rightarrow [\pi \vdash\_{3} \phi] = \bot. \end{array}$$

*Proof.* These two statements can be proven by induction on the structure of the LTL formula (see Appendix A.1 in [2]). [π |=<sup>3</sup> φ]= ? ⇒ μπ(φ, 1) = ? is the consequence of the first two.

#### **4 Counting Finitary Semantics for LTL**

In this section, we introduce the counting semantics for LTL. We first provide necessary definitions in Sect. 4.1, we present the new semantics in Sect. 4.2 and finally propose a predictive mapping that transforms the counting semantics into a qualitative 5-valued verdict in Sect. 4.3.

#### **4.1 Definitions**

Let <sup>N</sup><sup>+</sup> <sup>=</sup> <sup>N</sup><sup>0</sup> ∪ {∞, −} be the set of *natural* numbers (incl. 0) extended with the two special symbols <sup>∞</sup> (infinite) and <sup>−</sup> (impossible) such that <sup>∀</sup><sup>n</sup> <sup>∈</sup> <sup>N</sup>0, we define n < <sup>∞</sup> <sup>&</sup>lt; <sup>−</sup>. We define the addition <sup>⊕</sup> of two elements a, b <sup>∈</sup> <sup>N</sup><sup>+</sup> as follows.

**Definition 4 (Operator** <sup>⊕</sup>**).** *We define the binary operator* <sup>⊕</sup> : <sup>N</sup>+×N<sup>+</sup> <sup>→</sup> <sup>N</sup><sup>+</sup> *s. t. for* <sup>a</sup> <sup>⊕</sup> <sup>b</sup> *with* a, b <sup>∈</sup> <sup>N</sup><sup>+</sup> *we have* <sup>a</sup> <sup>+</sup> <sup>b</sup> *if* a, b <sup>∈</sup> <sup>N</sup><sup>0</sup> *and* max{a, b} *otherwise.*

We denote by (s, f) a pair of two extended numbers s, f <sup>∈</sup> <sup>N</sup>+. In Definition 5, we introduce several operations on pairs: (1) the *swap* between the two values (∼), (2) the increment by 1 of both values (⊕1), (3) the *minmax* binary operation () that gives the pair consisting of the minimum first value and the maximum second value, and (4) the *maxmin* binary operation () that is symmetric to ().

Definition 7 introduces the counting semantics for LTL that for a finite trace <sup>π</sup> and LTL formula <sup>φ</sup> gives a pair (s, f) <sup>∈</sup> <sup>N</sup><sup>+</sup> <sup>×</sup>N+. We call <sup>s</sup> and <sup>f</sup> *satisfaction* and *violation witness counts*, respectively. Intuitively, the s (f) value denotes the minimal number of additional steps that is needed to witness the satisfaction (violation) of the formula. The value ∞ is used to denote that the property can be satisfied (violated) only in an infinite number of steps, while − means the property cannot be satisfied (violated) by any continuation of the trace.

**Definition 5 (Operations** <sup>∼</sup>*,* <sup>⊕</sup>1*, ,* **).** *Given two pairs* (s, f) <sup>∈</sup> <sup>N</sup><sup>+</sup> <sup>×</sup> <sup>N</sup><sup>+</sup> *and* (s , f ) <sup>∈</sup> <sup>N</sup><sup>+</sup> <sup>×</sup> <sup>N</sup>+*, we have:*

$$\begin{array}{c} \sim (s, f) = (f, s), \\ (s, f) \oplus 1 = (s \oplus 1, f \oplus 1), \\ (s, f) \sqcup (s', f') = (\min(s, s'), \max(f, f')), \\ (s, f) \sqcap (s', f') = (\max(s, s'), \min(f, f')). \end{array}$$

*Example 6.* Given the pairs (0, 0), (∞, 1) and (7, −) we have the following:

$$\begin{array}{ccc} \sim(0,0)=(0,0), & \sim(\infty,1)=(1,\infty),\\ (0,0)\oplus1=(1,1), & (\infty,1)\oplus1=(\infty,2),\\ (0,0)\sqcup(\infty,1)=(0,1), & (\infty,1)\sqcup(7,-)=(7,-),\\ (0,0)\sqcap(\infty,1)=(\infty,0), & (\infty,1)\sqcap(7,-)=(\infty,1). \end{array}$$

**Remark.** Note that <sup>N</sup><sup>+</sup> <sup>×</sup> <sup>N</sup><sup>+</sup> forms a lattice where (s, f) - (s , f ) when s ≥ s and f ≤ f with join and meet . Intuitively, larger values are closer to true.

#### **4.2 Semantics**

We now present our finitary semantics.

**Definition 7 (Counting finitary semantics).** *Let* π ∈ Π<sup>∗</sup> *be a finite trace,* <sup>i</sup> <sup>∈</sup> <sup>N</sup>><sup>0</sup> *be a position in or outside the trace and* <sup>φ</sup> <sup>∈</sup> <sup>Φ</sup> *be an LTL formula. We define the counting finitary semantics of LTL as the function* <sup>d</sup><sup>π</sup> : <sup>Φ</sup> <sup>×</sup> <sup>Π</sup><sup>∗</sup> <sup>×</sup> <sup>N</sup>><sup>0</sup> <sup>→</sup> <sup>N</sup><sup>+</sup> <sup>×</sup> <sup>N</sup><sup>+</sup> *such that:*

$$\begin{array}{lcl} d\_{\pi}(p,i) &= \begin{cases} (0,-) & \text{if } i \leq |\pi| \wedge p \in \pi\_{i}, \\ (-,0) & \text{if } i \leq |\pi| \wedge p \notin \pi\_{i}, \\ (0,0) & \text{if } i > |\pi|, \end{cases} \\ d\_{\pi}(\neg\phi,i) &= \sim d\_{\pi}(\phi,i), \\ d\_{\pi}(\phi\_{1}\vee\phi\_{2},i) &= d\_{\pi}(\phi\_{1},i)\sqcup d\_{\pi}(\phi\_{2},i), \\ d\_{\pi}(\mathsf{X}\phi,i) &= d\_{\pi}(\phi,i+1)\oplus 1, \\ d\_{\pi}(\phi\Downarrow\psi,i) &= \begin{cases} d\_{\pi}(\psi,i)\sqcup\left(d\_{\pi}(\phi,i)\sqcap d\_{\pi}(\mathsf{X}(\phi\Downarrow\psi),i)\right) & \text{if } i \leq |\pi|, \\ d\_{\pi}(\psi,i)\sqcup\left(d\_{\pi}(\phi,i)\sqcap(-,\infty)\right) & \text{if } i > |\pi|, \\ d\_{\pi}(\phi,i)\sqcup d\_{\pi}(\mathsf{X}\mathsf{F}\phi,i) & \text{if } i \leq |\pi|, \\ d\_{\pi}(\phi,i)\sqcup(-,\infty) & \text{if } i > |\pi|. \end{cases} \end{array}$$

We now provide some motivations behind the above definitions.


*Example 8.* We refer to our motivating example from Table 1 and evaluate the trace τ<sup>2</sup> with respect to the specification ψ. We present the outcome in Table 2. We see that every proposition evaluates to (0, −) when true. The satisfaction of a proposition that holds at time i is immediately witnessed and it cannot be violated by any suffix. Similarly, a proposition evaluates to (−, 0) when false. The valuations of F g count the number of steps to positions in which g holds. For instance, the first time at which g holds is i = 3, hence F g evaluates to (2, −) at time 1, (1, −) at time 2 and (0, −) at time 3. We also note that F g evaluates to (0,∞) at the end of the trace – it could be immediately satisfied with the continuation of the trace with g that holds, but could be violated only by an infinite suffix in which g never holds. We finally observe that G(r → F g) evaluates to (∞,∞) at all positions – the property can be both satisfied and violated only with infinite suffixes.


**Table 2.** Unbounded response property example: dπ(φ, i) with the trace π = τ2.

We use "−" instead of "⊥" in the traces r and g to improve the readability.

Not all pairs (s, f) <sup>∈</sup> <sup>N</sup><sup>+</sup> <sup>×</sup> <sup>N</sup><sup>+</sup> are possible according to the counting semantics. We present the possible pairs in Lemma 9.

**Lemma 9.** *Let* <sup>π</sup> <sup>∈</sup> <sup>Π</sup><sup>∗</sup> *be a finite trace,* <sup>φ</sup> *an LTL formula and* <sup>i</sup> <sup>∈</sup> <sup>N</sup><sup>0</sup> *an index. We have that* dπ(φ, i) *is of the form* (a, −)*,* (−, a)*,* (b1, b2)*,* (b1,∞)*,* (∞, b2) *or* (∞,∞)*, where* a ≤ |π| − i *and* b<sup>j</sup> > |π| − i *for* j ∈ {1, 2}*.*

*Proof.* The proof can be obtained using structural induction on the LTL formula (see Appendix A.2 in [2]).

Finally, we relate our counting semantics to the three valued semantics in Lemma 10.

**Lemma 10.** *Given an LTL formula and a trace* <sup>π</sup> <sup>∈</sup> <sup>Π</sup><sup>∗</sup> *where* <sup>i</sup> <sup>∈</sup> <sup>N</sup>><sup>0</sup> *is an index and* φ *is an LTL formula, we have that*

$$\begin{array}{ccl} d\_{\pi}(\phi,i) = (a,-) & \leftrightarrow \mu\_{\pi}(\phi,i) = \top, \\ & \text{and } \nexists x < a \ . \pi' = \pi\_{i} \cdot \pi\_{i+1} \cdot \ldots \pi\_{i+x}, \mu\_{\pi'}(\phi,1) = \top \\\ d\_{\pi}(\phi,i) = (-,a) & \leftrightarrow \mu\_{\pi}(\phi,i) = \bot, \\ & \text{and } \nexists x < a \ . \pi' = \pi\_{i} \cdot \pi\_{i+1} \cdot \ldots \pi\_{i+x}, \mu\_{\pi'}(\phi,1) = \bot \\\ d\_{\pi}(\phi,i) = (b\_{1},b\_{2}) \leftarrow \mu\_{\pi}(\phi,i) = ?, \end{array}$$

*where* a ≤ |π| − i *and* b<sup>j</sup> *is either* ∞ *or* b<sup>j</sup> > |π| − i *for* j ∈ {1, 2}*.*

Intuitively, Lemma 10 holds because we only introduce the symbol "−" within the trace when a satisfaction (violation) is observed. And the values of a pair only propagate into the past (and never into the future).

#### **4.3 Evaluation**

We now propose a mapping that predicts a qualitative verdict from our counting semantics. We adopt a 5-valued set consisting of true (), presumably true (<sup>P</sup> ), inconclusive (?), presumably false (⊥<sup>P</sup> ) and false (⊥) verdicts. We define the following order over these five values: ⊥ < ⊥<sup>P</sup> < ? < <sup>P</sup> < . We equip this 5-valued domain with the negation (¬) and disjunction (∨) operations, letting ¬ = ⊥, ¬<sup>P</sup> = ⊥<sup>P</sup> , ¬? = ?, ¬⊥<sup>P</sup> = <sup>P</sup> , ¬⊥ = and φ<sup>1</sup> ∨ φ<sup>2</sup> = max{φ1, φ2}. We define other Boolean operators such as conjunction by the usual logical equivalences (φ<sup>1</sup> ∧ φ<sup>2</sup> = ¬(¬φ<sup>1</sup> ∨ ¬φ2), etc.).

We evaluate a property on a trace to (⊥) when the satisfaction (violation) can be fully determined from the trace, following the definition of the threevalued semantics μ. Intuitively, this takes care of the case in which the safety (co-safety) part of a formula has been violated (satisfied), at least for properties that are intentionally safe (intentionally co-safe, resp.) [10].

Whenever the truth value is not determined, we distinguish whether dπ(φ, i) indicates the possibility for a satisfaction, respective violation, in finite time or not. For possible satisfactions, respective violations, in finite time we make a prediction on whether past observations support the believe that the trace is going to satisfy or violate the property. If the predictions are not inconclusive and not contradicting, then we evaluate the trace to the (presumable) truth value <sup>P</sup> or⊥<sup>P</sup> . If we cannot make a prediction to a truth value, we compute the truth value recursively based on the operator in the formula and the truth values of the subformulas (with temporal operators unrolled).

We use the predicate pred<sup>π</sup> to give the prediction based on the observed witnesses for satisfaction. The predicate predπ(φ, i) becomes ? when no witness for satisfaction exists in the past. When there exists a witness that requires at least the same amount of additional steps as the trace under evaluation then the predicate evaluates to . If all the existing witnesses (and at least one exists) are shorter than the current trace, then the predicate evaluates to ⊥. For a prediction on the violation we make a prediction on the satisfaction of dπ(¬φ, i), i.e., we compute predπ(¬φ, i).

**Definition 11 (Prediction predicate).** *Let* s, f *denote natural numbers and let* <sup>s</sup>π(φ, i), fπ(φ, i) <sup>∈</sup> <sup>N</sup><sup>+</sup> *such that* <sup>d</sup>π(φ, i) = sπ(φ, i), fπ(φ, i) *. We define the* 3*-valued predicate pred*<sup>π</sup> *as*

$$pred\_{\pi}(\phi, i) = \begin{cases} \top & \text{if } \exists j < i \text{ } .d\_{\pi}(\phi, j) = (s', -) \text{ and } s\_{\pi}(\phi, i) \le s', \\ ? & \text{if } \exists j < i \text{ } d\_{\pi}(\phi, j) = (s', -), \\ \bot & \text{if } \exists j < i \text{ } d\_{\pi}(\phi, j) = (s', -) \text{ and } \\ & \quad s\_{\pi}(\phi, i) > \max\_{0 \le j < i} \{s' \mid d\_{\pi}(\phi, j) = (s', -)\}, \end{cases}$$

For the evaluation we consider a case split among the possible combinations of values in the pairs.

**Definition 12 (Predictive evaluation).** *We define the* predictive evaluation *function* <sup>e</sup>π(φ, i)*, with* <sup>a</sup> ≤ |π| − <sup>i</sup> *and* <sup>b</sup><sup>j</sup> <sup>&</sup>gt; <sup>|</sup>π| − <sup>i</sup> *for* <sup>j</sup> ∈ {1, <sup>2</sup>} *and* a, b<sup>j</sup> <sup>∈</sup> <sup>N</sup>0*, for the different cases of* dπ(φ, i)*:*


where rπ(φ, i) is an auxiliary function defined inductively as follows:

$$\begin{aligned} r\_{\pi}(p,i)&=?\\ r\_{\pi}(\neg\phi,i)&=\neg e\_{\pi}(\phi,i)\\ r\_{\pi}(\phi\_{1}\lor\phi\_{2},i)&=e\_{\pi}(\phi\_{1},i)\lor e\_{\pi}(\phi\_{2},i)\\ r\_{\pi}(\mathsf{X}^{n}\phi,i)&=e\_{\pi}(\phi,i+n)\\ r\_{\pi}(\mathsf{F}\,\phi,i)&=\begin{cases} e\_{\pi}(\phi,i)\lor r\_{\pi}(\mathsf{X}\mathsf{F}\,\phi,i) & \text{if } i\leq|\pi|\\ e\_{\pi}(\phi,i) & \text{if } i>|\pi|\end{cases}\\ r\_{\pi}(\phi\_{1}\cup\phi\_{2},i)&=\begin{cases} e\_{\pi}(\phi\_{2},i)\lor(e\_{\pi}(\phi\_{2},i)\land e\_{\pi}(\mathsf{X}(\phi\_{1}\cup\phi\_{2}),i) & \text{if } i\leq|\pi|\\ e\_{\pi}(\phi\_{2},i) & \text{if } i>|\pi|\end{cases}\end{aligned}$$

The predictive evaluation function is symmetric. Hence, eπ(φ, i) = ¬eπ(¬φ, i) holds.

*Example 13.* The outcome of evaluating τ<sup>2</sup> from Table 1 is shown in Table 3. Subformula r → F g is predicted to be <sup>P</sup> at i = 7 because there exists a longer witness for satisfaction in the past (e.g., at i = 1). Thus, the trace evaluates to <sup>P</sup> , as expected.

In Fig. 1 we visualize the evaluation of a pair dπ(φ, i)=(s, f) for a fixed φ and a fixed position i. On the x-axis is the witness count s for a satisfaction and on the y-axis is the witness count f for a violation. For a value s, respectively f, that is smaller than the length of the suffix starting at position i (with the other value of the pair always being −), the evaluation is either or ⊥. Otherwise the evaluation depends on the values smax and fmax. These two values


**Table 3.** Unbounded response property example with π = τ2.

We use "−" instead of "⊥" in the traces r and g to improve the readability.

represent the largest witness counts for a satisfaction and a violation in the past, i.e., for positions smaller than i in the trace. Based on the prediction function predπ(φ, i) the evaluation becomes <sup>P</sup> , ? or ⊥<sup>P</sup> , where ? indicates that the auxiliary function rπ(φ, i) has to be applied. Starting at an arbitrary point in the diagram and moving to the right increases the witness count for a satisfaction while the witness count for a violation remains constant. Thus, moving to the right makes the pair "more false". The same holds when keeping the witness count for a satisfaction constant and moving up in the diagram as this decrease the witness count for a violation. Analogously, moving down and/or left makes the pair "more true" as the witness count for a violation gets larger and/or the witness count for a satisfaction gets smaller.

Our 5-valued predictive evaluation refines the 3-valued LTL semantics.

**Theorem 14.** *Let* <sup>φ</sup> *be an LTL formula,* <sup>π</sup> <sup>∈</sup> <sup>Π</sup><sup>∗</sup> *and* <sup>i</sup> <sup>∈</sup> <sup>N</sup>><sup>0</sup>*. We have*

$$\begin{array}{l} \mu\_{\pi}(\phi,i) = \top \leftrightarrow e\_{\pi}(\phi,i) = \top, \\\mu\_{\pi}(\phi,i) = \bot \leftrightarrow e\_{\pi}(\phi,i) = \bot, \\\mu\_{\pi}(\phi,i) = ? \leftrightarrow e\_{\pi}(\phi,i) \in \{\top\_{P}, \bot\_{P}, ?\}. \end{array}$$

Theorem 14 holds, because the evaluation to and ⊥ is simply the mapping of a pair that contains the symbol "−", which we have shown in Lemma 10.

Remember that <sup>N</sup><sup>+</sup> <sup>×</sup>N<sup>+</sup> is partially ordered by -. We now show that having a trace that is "more true" than another is correctly reflected in our finitary semantics. To define "more true", we first need the polarity of a proposition in an LTL formula.

*Example 15.* Note that g has positive polarity in φ = G(r → F g). If we define τ <sup>2</sup> to be as τ2, except that g ∈ τ <sup>2</sup>(i) for i ∈ {1,..., 6}, we have e<sup>τ</sup>- <sup>2</sup> (φ, i) = ⊥<sup>P</sup> , whereas e<sup>τ</sup><sup>2</sup> (φ, i) = <sup>P</sup> .

**Fig. 1.** Lattice for (s, f) with φ and i < |π| fixed.

**Definition 16 (Polarity).** *Let* #¬ *be the number of negation operators on a specific path in the parse tree of* φ *starting at the root. We define the polarity as the function pol*(p) *with proposition* p *in an LTL formula* φ *as follows:*

$$pol(p) = \begin{cases} pos, & \text{if } \#\neg \text{ on all paths to } a \text{ leaf with proposition } p \text{ is even,} \\ neg, & \text{if } \#\neg \text{ on all paths to } a \text{ leaf with proposition } p \text{ is odd,} \\ \text{mixed,} & \text{otherwise.} \end{cases}$$

With the polarity defined, we now define the constraints for a trace to be "more true" with respect to an LTL formula φ.

**Definition 17 (**π <sup>φ</sup> π **).** *Given two traces* π *and* π *of equal length and an LTL formula* φ *over proposition* p*, we define that* π <sup>φ</sup> π *iff*

$$\begin{array}{l}\forall i \forall p \,\,\,pol(p) = \textit{mixed} \Rightarrow p \in \pi\_{i} \leftrightarrow p \in \pi'\_{i} \,\,\,and \\\ \textit{pol}(p) = \textit{pos} \Rightarrow p \in \pi\_{i} \to p \in \pi'\_{i} \,\,and \\\ \textit{pol}(p) = \textit{neg} \Rightarrow p \in \pi\_{i} \leftarrow p \in \pi'\_{i}. \end{array}$$

Whenever one trace is "more true" than another, this is correctly reflected in our finitary semantics.

**Theorem 18.** *For two traces* π *and* π *of equal length and an LTL formula* φ *over proposition* p*, we have that*

$$
\pi \sqsubseteq\_{\phi} \pi' \Rightarrow d\_{\pi'}(\phi, 1) \triangleleft d\_{\pi}(\phi, 1).
$$

*Therefore, we have for* π <sup>φ</sup> π *that*

$$\begin{aligned} e\_\pi(\phi, 1) = \top &\Rightarrow e\_{\pi'}(\phi, 1) = \top, \text{ and} \\ e\_\pi(\phi, 1) = \bot &\Leftarrow e\_{\pi'}(\phi, 1) = \bot. \end{aligned}$$

Theorem 18 holds, because we have that replacing an arbitrary observed value in π by one with positive polarity in π always results with dπ(φ, 1) = (s, f) and dπ- (φ, 1) = (s , f ) in s ≤ s and f ≥ f, as with π <sup>φ</sup> π we have that π witnesses a satisfaction of φ not later than π and π also witness a violation of φ not earlier than π.


**Table 4.** Making a system "more true".

In Table 4 we give examples to illustrate the transition of one evaluation to another one. Note that it is possible to change from <sup>P</sup> to ⊥<sup>P</sup> . However, this is only the predicated truth value that becomes "worse", because we have strengthened the prefix on which the prediction is based on, the values of dπ(φ, i) do not change and remain the same is such a case.

#### **5 Examples**

We demonstrate the strengths and weaknesses of our approach on the examples of LTL specifications and traces shown in Table 5. We fully develop these examples in Appendix B in [2].


**Table 5.** Examples of LTL specifications and traces

Table 6 summarizes the evaluation of our examples. The first and the second column denote the evaluated specification and trace. We use these examples to compare LTL with counting semantics (c-LTL) presented in this paper, to the other two popular finitary LTL interpretations, the 3-valued LTL semantics [4] (3-LTL) and LTL on trucated paths [9] (t-LTL). We recall that in t-LTL there is a distinction between a weak and a strong next operator. We denote by t-LTL-s (t-LTL-w) the specifications from our examples in which X is interpreted as the strong (weak) next operator and assume that we always give a strong interpretation to U and F and a weak interpretation to G.


**Table 6.** Comparison of different verdicts with different semantics

There are two immediate observations that we can make regarding the results presented in Table 6. First, the 3-valued LTL gives for all the examples an *inconclusive* verdict, a feedback that after all has little value to a verification engineer. The second observation is that the verdicts from c-LTL and t-LTL can differ quite a lot, which is not very surprising given the different strategies to interpret the unseen future. We now further comment on these examples, explaining in more details the results and highlighting the intuitive outcomes of c-LTL for a large class of interesting LTL specifications.

*Effect of Nested Next.* We evaluate with ψ<sup>1</sup> and ψ<sup>2</sup> the effect of nesting X in an F and an G formula, respectively. We make a prediction on Xg at the end of the trace before evaluating F and G. As a consequence, we find that (ψ1, π1) evaluates to presumably false, while (ψ2, π2) evaluates to presumably true. In t-LTL, this class of specification is very sensitive to the weak/strong interpretation of next, as we can see from the verdicts.

*Request/Grants.* We evaluate the request/grant property ψ<sup>3</sup> from the motivating example on the trace π3. We observe that r at cycle 2 is followed by g at cycle 3, while r at cycle 5 is not followed by g at cycle 6. Hence, (ψ3, π3) evaluates to presumably false.

*Concurrent Request/Grants.* We evaluate the specification ψ<sup>4</sup> against the trace π4. In this example r<sup>1</sup> is triggered at even time stamps and r<sup>2</sup> is triggered at odd time stamps. Every request is granted in one cycle. It follows that regardless of the time when the trace ends, there is one request that is not granted yet. We note that ψ<sup>4</sup> is a conjunction of two basic request/grant properties and we make independent predictions for each conjunct. Every basic request/grant property is evaluated to presumably true, hence (ψ4, π4) evaluates to presumably true. At this point, we note that in t-LTL, every request that is not granted by the end of the trace results in the property violation, regardless of the past observations.

*Until.* We use the specification ψ<sup>5</sup> and the trace π<sup>5</sup> to evaluate the effect of U on the predictions. The specification requires that Xr continuously holds until X Xg becomes true. We can see that in π<sup>5</sup> Xr is witnessed at cycles 1 − 4, while X Xg is witnessed at cycle 5. We can also see that Xr is again witnessed from cycle 6 until the end of the trace at cycle 8. As a consequence, (ψ5, π5) is evaluated to presumably true.

*Stabilization.* The specification ψ<sup>6</sup> says that the value of g has to eventually stabilize to either true or false. We evaluate the formula on two traces π<sup>6</sup> and π7. In the trace π6, g alternates between true and false every two cycles and becomes true in the last cycle. Hence, there is no sufficiently long witness of trace stabilization (ψ6, π6) evaluates to presumably false. In the trace π7, g also alternates between true and false every two cycles, but in the last four cycles g remains continuously true. As a consequence, (ψ6, π7) evaluates to presumably true. This example also illustrates the importance of when the trace truncation occurs. If both π<sup>6</sup> and π<sup>7</sup> were truncated at cycle 5, both (ψ6, π6) and (ψ6, π7) would evaluate to presumably false. We note that ψ<sup>6</sup> is satisfied by all traces in t-LTL.

*Sub-formula Domination.* The specification ψ<sup>7</sup> exposes a weakness of our approach. It requires that in every cycle, either r or g is witnessed in some unbounded future. With our approach, (ψ7, π8) evaluates to presumably false. This is against our intuition because we have observed that g becomes regularly true very second time step. However, in this example our prediction for F r dominates over the prediction for F g, leading to the unexpected presumably false verdict. On the other hand, t-LTL interpretation of the same specification is dependent only on the last value of r and g.

*Semantically Equivalent Formulas.* We now demonstrate that our approach may give different answers for semantically equivalent formulas. For instance, both ψ<sup>8</sup> and ψ<sup>9</sup> are semantically equivalent to ψ7. We have that (ψ8, π8) evaluates to presumably false, while (ψ9, π8) evaluates to presumably true. We note that t-LTL verdicts are stable for semantically different formulas.

#### **6 Conclusion**

We have presented a novel finitary semantics for LTL that uses the history of satisfaction and violation in a finite trace to predict whether the co-safety and safety aspects of a formula will be satisfied in the extension of the trace to an infinite one. We claim that the semantics closely follow human intuition when predicting the truth value of a trace. The presented examples (incl. nonmonitorable LTL properties) illustrate our approach and support this claim.

Our definition of the semantics is trace-based, but it is easily extended to take an entire database of traces into account, which may make the approach more precise. Our approach currently uses a very simple form of learning to predict the future. We would like to consider more sophisticated statistical methods to make better predictions. In particular, we plan to apply nonparametric statistical methods (i.e., the Wilcoxon signed-rank test [16]), in combination with our counting semantics, to identify and quantify the traces that are outliers.

#### **References**


7180, pp. 304–319. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3- 642-28717-6 24


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

# Tools

### **Rabinizer 4: From LTL to Your Favourite Deterministic Automaton**

Jan Kˇret´ınsk´y(B) , Tobias Meggendorfer , Salomon Sickert , and Christopher Ziegler

Technical University of Munich, Munich, Germany jan.kretinsky@gmail.com, {meggendo,sickert}@in.tum.de

**Abstract.** We present Rabinizer 4, a tool set for translating formulae of linear temporal logic to different types of deterministic ω-automata. The tool set implements and optimizes several recent constructions, including the first implementation translating the frequency extension of LTL. Further, we provide a distribution of PRISM that links Rabinizer and offers model checking procedures for probabilistic systems that are not in the official PRISM distribution. Finally, we evaluate the performance and in cases with any previous implementations we show enhancements both in terms of the size of the automata and the computational time, due to algorithmic as well as implementation improvements.

#### **1 Introduction**

**Automata-theoretic approach** [VW86] is a key technique for verification and synthesis of systems with linear-time specifications, such as formulae of linear temporal logic (LTL) [Pnu77]. It proceeds in two steps: first, the formula is translated into a corresponding automaton; second, the product of the system and the automaton is further analyzed. The size of the automaton is important as it directly affects the size of the product and thus largely also the analysis time, particularly for deterministic automata and probabilistic model checking in a very direct proportion. For verification of non-deterministic systems, mostly non-deterministic B¨uchi automata (NBA) are used [EH00,SB00,GO01,GL02, BKRS12 ˇ ,DLLF+16] since they are typically very small and easy to produce.

**Probabilistic LTL model checking** cannot profit directly from NBA. Even the qualitative question, whether a formula holds with probability 0 or 1, requires automata with at least a restricted form of determinism. The prime example are the limit-deterministic (also called semi-deterministic) B¨uchi automata (LDBA) [CY88] and the generalized LDBA (LDGBA). However, for the general quantitative questions, where the probability of satisfaction is computed, general limitdeterminism is not sufficient. Instead, deterministic Rabin automata (DRA) have

This work has been partially supported by the Czech Science Foundation grant No. P202/12/G061 and the German Research Foundation (DFG) project KR 4890/1-1 "Verified Model Checkers" (317422601). A part of the frequency extension has been implemented within Google Summer of Code 2016.

c The Author(s) 2018

H. Chockler and G. Weissenbacher (Eds.): CAV 2018, LNCS 10981, pp. 567–577, 2018. https://doi.org/10.1007/978-3-319-96145-3\_30

**Fig. 1.** LTL translations to different types of automata. Translations implemented in Rabinizer 4 are indicated with a solid line. The traditional approaches are depicted as dotted arrows. The determinization of NBA to DRA is implemented in ltl2dstar [Kle], to LDBA in Seminator [BDK+17] and to (mostly) DPA in spot [DLLF+16].

been mostly used [KNP11] and recently also deterministic generalized Rabin automata (DGRA) [CGK13]. In principle, all standard types of deterministic automata are applicable here except for deterministic B¨uchi automata (DBA), which are not as expressive as LTL. However, other types of automata, such as deterministic Muller and deterministic parity automata (DPA) are typically larger than DGRA in terms of acceptance condition or the state space, respectively.<sup>1</sup> Recently, several approaches with specific LDBA were proved applicable to the quantitative setting [HLS+15,SEJK16] and competitive with DGRA. Besides, model checking MDP against LTL properties involving frequency operators [BDL12] also allows for an automata-theoretic approach, via deterministic generalized Rabin mean-payoff automata (DGRMA) [FKK15].

**LTL synthesis** can also be solved using the automata-theoretic approach. Although DRA and DGRA transformed into games can be used here, the algorithms for the resulting Rabin games [PP06] are not very efficient in practice. In contrast, DPA may be larger, but in this setting they are the automata of choice due to the good practical performance of parity-game solvers [FL09,ML16,JBB+17].

**Types of Translations.** The translations of LTL to NBA, e.g., [VW86], are typically *"semantic"* in the sense that each state is given by a set of logical formulae and the language of the state can be captured in terms of semantics of these formulae. In contrast, the determinization of Safra [Saf88] or its improvements [Pit06,Sch09,TD14,FL15] are not "semantic" in the sense that they ignore the structure and produce trees as the new states that, however, lack the logical interpretation. As a result, if we apply Safra's determinization on semantically created NBA, we obtain DRA that lack the structure and, moreover, are unnecessarily large since the construction cannot utilize the original structure. In contrast, the

<sup>1</sup> Note that every DGRA can be written as a Muller automaton on the same state space with an exponentially-sized acceptance condition, and DPA are a special case of DRA and thus DGRA.

recent works [KE12,KLG13,EK14,KV15,SEJK16,EKRS17,MS17,KV17] provide "semantic" constructions, often producing smaller automata. Furthermore, various transformations such as degeneralization [KE12], index appearance record [KMWW17] or determinization of limit-deterministic automata [EKRS17] preserve the semantic description, allowing for further optimizations of the resulting automata.

**Our Contribution.** While all previous versions of Rabinizer [GKE12,KLG13, KK14] featured only the translation LTL→DGRA→DRA, Rabinizer 4 now implements all the translations depicted by the solid arrows in Fig. 1. It improves all these translations, both algorithmically and implementation-wise, and moreover, features the first implementation of the translation of a frequency extension of LTL [FKK15].

Further, in order to utilize the resulting automata for verification, we provide our own distribution<sup>2</sup> of the PRISM model checker [KNP11], which allows for model checking MDP against LTL using not only DRA and DGRA, but also using LDBA and against frequency LTL using DGRMA. Finally, the tool can turn the produced DPA into parity games between the players with input and output variables. Therefore, when linked to parity-game solvers, Rabinizer 4 can be also used for LTL synthesis.

Rabinizer 4 is freely available at http://rabinizer.model.in.tum.de together with an on-line demo, visualization, usage instructions and examples.

#### **2 Functionality**

We recall that the previous version Rabinizer 3 has the following functionality:


#### **2.1 Translations**

Rabinizer 4 inputs formulae of LTL and outputs automata in the standard HOA format [BBD+15], which is used, e.g., as the input format in PRISM. Automata in the HOA format can be directly visualized, displaying the "semantic" description of the states. Rabinizer 4 features the following command-line tools for the respective translations depicted as the solid arrows in Fig. 1:

**ltl2dgra** and **ltl2dra** correspond to the original functionality of Rabinizer 3, i.e., they translate LTL (now with the extended syntax, including all common temporal operators) to DGRA and DRA [EK14], respectively.

<sup>2</sup> Merging these features into the public release of PRISM as well as linking the new version of Rabinizer is subject to current collaboration with the authors of PRISM.

	- The default mode uses the translation to LDBA, followed by a LDBAto-DPA determinization [EKRS17] specially tailored to LDBA with the "semantic" labelling of states, avoiding additional exponential blow-up of the resulting automaton.
	- The alternative mode uses the translation to DRA, followed by our improvement of the index appearance record of [KMWW17].

#### **2.2 Verification and Synthesis**

The resulting automata can be used for model checking probabilistic systems and for LTL synthesis. To this end, we provide our own distribution of the probabilistic model checker PRISM as well as a procedure transforming automata into games to be solved.

**Model checking: PRISM distribution.** For model checking Markov chains and Markov decision processes, PRISM [KNP11] uses DRA and recently also more efficient DGRA [CGK13,KK14]. Our distribution, which links Rabinizer, additionally features model checking using the LDBA [SEJK16, SK16] that are created by our **ltl2ldba**.

Further, the distribution provides an implementation of frequency LTL\**GU** model checking, using DGRMA. To the best of our knowledge, there are no other implemented procedures for logics with frequency. Here, techniques of linear programming for multi-dimensional mean-payoff satisfaction [CKK15] and the model-checking procedure of [FKK15] are implemented and applied.

**Synthesis: Games.** The automata-theoretic approach to LTL synthesis requires to transform the LTL formula into a game of the input and output players. We provide this transformer and thus an end-to-end LTL synthesis solution, provided a respective game solver is linked. Since current solutions to Rabin games are not very efficient we implemented a transformation of DPA into parity games and a serialization to the format of PG Solver [FL09]. Due to the explicit serialization, we foresee the main use in quick prototyping.

<sup>3</sup> The *frequential globally* construct [BDL12,BMM14] **<sup>G</sup>**∼<sup>ρ</sup><sup>ϕ</sup> with ∼ ∈ {≥, >, <sup>≤</sup>, <}, ρ <sup>∈</sup> [0, 1] intuitively means that the fraction of positions satisfying <sup>ϕ</sup> satisfies <sup>∼</sup>ρ. Formally, the fraction on an infinite run is defined using the long-run average [BMM14].

#### **3 Optimizations, Implementation, and Evaluation**

Compared to the theoretical constructions and previous implementations, there are numerous improvements, heuristics, and engineering enhancements. We evaluate the improvements both in terms of the size of the resulting automaton as well as the running time. When comparing with respect to the original Rabinizer functionality, we compare our implementation **ltl2dgra** to the previous version Rabinizer 3.1, which is already a significantly faster [EKS16] re-implementation of the official release Rabinizer 3 [KK14]. All of the benchmarks have been executed on a host with i7-4700MQ CPU (4x2.4 GHz), running Linux 4.9.0-5-amd64 and the Oracle JRE 9.0.4+11 JVM. Due to the start-up time of JVM, all times below 2 s are denoted by <2 and not specified more precisely. All experiments were given a time-out of 900 s and mem-out of 4GB, denoted by −.

#### **Algorithmic improvements and heuristics** for each of the translations:


Besides, we add an option to generate a non-deterministic initial component for the LDBA instead of a deterministic one. Although the LDBA is then no more suitable for quantitative probabilistic model checking, it still is for qualitative model checking. At the same time, it can be much smaller, see Table 4 which shows a significant improvement on the particular formula.


ties are not resolved. Consequently, it cannot happen that an irrelevant tie is resolved in two different ways like in [KMWW17], thus effectively merging such states.

**Table 1.** Effect of simplifications and suspension for **ltl2dgra** on the formulae ψ<sup>i</sup> = **<sup>G</sup>**φ<sup>i</sup> where <sup>φ</sup><sup>1</sup> <sup>=</sup> <sup>a</sup>1, φ(i)=(ai**U**(**X**φ<sup>i</sup>−<sup>1</sup>)), and <sup>ψ</sup> <sup>i</sup> = **G**φ <sup>i</sup> where φ <sup>1</sup> = a1, φ <sup>1</sup> = (φ <sup>i</sup>−<sup>1</sup>**U**(**X**<sup>i</sup> ai), displaying execution time in seconds/#states.


**Table 2.** Effect of computing acceptance sets per SCC on formulae <sup>ψ</sup><sup>1</sup> <sup>=</sup> <sup>x</sup><sup>1</sup> <sup>∧</sup> <sup>φ</sup>1, <sup>ψ</sup><sup>2</sup> = (x<sup>1</sup> <sup>∧</sup>φ1)∨(¬x<sup>1</sup> <sup>∧</sup>φ2), <sup>ψ</sup><sup>3</sup> = (x<sup>1</sup> <sup>∧</sup>x<sup>2</sup> <sup>∧</sup>φ1)∨(¬x<sup>1</sup> <sup>∧</sup>x<sup>2</sup> <sup>∧</sup>φ2)∨(x<sup>1</sup> ∧ ¬x<sup>2</sup> <sup>∧</sup>φ3), . . . , where <sup>φ</sup><sup>i</sup> <sup>=</sup> **XG**((ai**U**bi)∨(ci**U**di)), displaying execution time in seconds/#acceptance sets.


**Table 3.** Effect of break-point elimination for **ltl2ldba** on safety formulae s(n, m) = n <sup>i</sup>=1 **<sup>G</sup>**(a<sup>i</sup> <sup>∨</sup> **<sup>X</sup>**<sup>m</sup>bi) and for **ltl2ldgba** on liveness formulae <sup>l</sup>(n, m) = n <sup>i</sup>=1 **GF**(a<sup>i</sup> <sup>∧</sup> **X**<sup>m</sup>bi), displaying #states (#B¨uchi conditions)


**Table 4.** Effect of non-determinism of the initial component for **ltl2ldba** on formulae <sup>f</sup>(i) = **<sup>F</sup>**(<sup>a</sup> <sup>∧</sup> **<sup>X</sup>**<sup>i</sup> **G**b), displaying #states (#B¨uchi conditions)


**Table 5.** Comparison of the average performance with the previous version of Rabinizer. The statistics are taken over a set of 200 standard formulae [KMS18] used, e.g., in [BKS13,EKS16], run in a batch mode for both tools to eliminate the effect of the JVM start-up overhead.


**Implementation.** The main performance bottleneck of the older implementations is that explicit data structures for the transition system are not efficient for larger alphabets. To this end, Rabinizer 3.1 provided symbolic (BDD) representation of states and edge labels. On the top, Rabinizer 4 represents the transition function symbolically, too.

Besides, there are further engineering improvements on issues such as storing the acceptance condition only as a local edge labelling, caching, data-structure overheads, SCC-based divide-and-conquer constructions, or the introduction of parallelization for batch inputs.

**Average Performance Evaluation.** We have already illustrated the improvements on several hand-crafted families of formulae. In Tables 1 and 2 we have even seen the respective running-time speed-ups. As the basis for the overall evaluation of the improvements, we use some established datasets from literature, see [KMS18], altogether two hundred formulae. The results in Table 5 indicate that the performance improved also on average among the more realistic formulae.

#### **4 Conclusion**

We have presented Rabinizer 4, a tool set to translate LTL to various deterministic automata and to use them in probabilistic model checking and in synthesis. The tool set extends the previous functionality of Rabinizer, improves on previous translations, and also gives the very first implementations of frequency LTL translation as well as model checking. Finally, the tool set is also more userfriendly due to richer input syntax, its connection to PRISM and PG Solver, and the on-line version with direct visualization, which can be found at http:// rabinizer.model.in.tum.de.

#### **References**

[BBD+15] Babiak, T., et al.: The hanoi omega-automata format. In: Kroening, D., P˘as˘areanu, C.S. (eds.) CAV 2015. LNCS, vol. 9206, pp. 479–486. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-21690-4 31

	- [BDK+17] Blahoudek, F., Duret-Lutz, A., Klokoˇcka, M., Kˇret´ınsk´y, M., Strejˇcek, J.: Seminator: a tool for semi-determinization of omega-automata. In: LPAR, pp. 356–367 (2017)
		- [BDL12] Bollig, B., Decker, N., Leucker, M.: Frequency linear-time temporal logic. In: TASE, pp. 85–92 (2012)
	- [BKRS12] Babiak, T., Kˇ ˇ ret´ınsk´y, M., Reh´ ˇ ak, V., Strejˇcek, J.: LTL to B¨uchi automata translation: fast and more deterministic. In: Flanagan, C., K¨onig, B. (eds.) TACAS 2012. LNCS, vol. 7214, pp. 95–109. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-28756-5 8
		- [BKS13] Blahoudek, F., Kˇret´ınsk´y, M., Strejˇcek, J.: Comparison of LTL to deterministic rabin automata translators. In: McMillan, K., Middeldorp, A., Voronkov, A. (eds.) LPAR 2013. LNCS, vol. 8312, pp. 164–172. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-45221-5 12
	- [BMM14] Bouyer, P., Markey, N., Matteplackel, R.M.: Averaging in LTL. In: Baldan, P., Gorla, D. (eds.) CONCUR 2014. LNCS, vol. 8704, pp. 266–280. Springer, Heidelberg (2014). https://doi.org/10.1007/978-3-662-44584- 6 19
	- [CGK13] Chatterjee, K., Gaiser, A., Kˇret´ınsk´y, J.: Automata with generalized rabin pairs for probabilistic model checking and LTL synthesis. In: Sharygina, N., Veith, H. (eds.) CAV 2013. LNCS, vol. 8044, pp. 559–575. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-39799- 8 37
	- [CKK15] Chatterjee, K., Kom´arkov´a, Z., Kˇret´ınsk´y, J.: Unifying two views on multiple mean-payoff objectives in Markov decision processes. In: LICS, pp. 244–256 (2015)
	- [CY88] Courcoubetis, C., Yannakakis, M.: Verifying temporal properties of finitestate probabilistic programs. In: FOCS, pp. 338–345 (1988)
	- [EH00] Etessami, K., Holzmann, G.J.: Optimizing B¨uchi automata. In: Palamidessi, C. (ed.) CONCUR 2000. LNCS, vol. 1877, pp. 153–168. Springer, Heidelberg (2000). https://doi.org/10.1007/3-540-44618-4 13
	- [EK14] Esparza, J., Kˇret´ınsk´y, J.: From LTL to deterministic automata: a safraless compositional approach. In: Biere, A., Bloem, R. (eds.) CAV 2014. LNCS, vol. 8559, pp. 192–208. Springer, Cham (2014). https://doi.org/ 10.1007/978-3-319-08867-9 13
	- [EKRS17] Esparza, J., Kˇret´ınsk´y, J., Raskin, J.-F., Sickert, S.: From LTL and limit-deterministic B¨uchi automata to deterministic parity automata. In: Legay, A., Margaria, T. (eds.) TACAS 2017. LNCS, vol. 10205, pp. 426–442. Springer, Heidelberg (2017). https://doi.org/10.1007/978- 3-662-54577-5 25
	- [FL09] Friedmann, O., Lange, M.: Solving parity games in practice. In: Liu, Z., Ravn, A.P. (eds.) ATVA 2009. LNCS, vol. 5799, pp. 182–196. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-642-04761-9 15
	- [FL15] Fisman, D., Lustig, Y.: A modular approach for b¨uchi determinization. In: CONCUR, pp. 368–382 (2015)
	- [GL02] Giannakopoulou, D., Lerda, F.: From states to transitions: improving translation of LTL formulae to B¨uchi automata. In: Peled, D.A., Vardi, M.Y. (eds.) FORTE 2002. LNCS, vol. 2529, pp. 308–326. Springer, Heidelberg (2002). https://doi.org/10.1007/3-540-36135-9 20
	- [KE12] Kˇret´ınsk´y, J., Esparza, J.: Deterministic automata for the (F,G)-fragment of LTL. In: Madhusudan, P., Seshia, S.A. (eds.) CAV 2012. LNCS, vol. 7358, pp. 7–22. Springer, Heidelberg (2012). https://doi.org/10.1007/978- 3-642-31424-7 7
	- [KK14] Kom´arkov´a, Z., Kˇret´ınsk´y, J.: Rabinizer 3: safraless translation of LTL to small deterministic automata. In: Cassez, F., Raskin, J.-F. (eds.) ATVA 2014. LNCS, vol. 8837, pp. 235–241. Springer, Cham (2014). https://doi. org/10.1007/978-3-319-11936-6 17
	- [Kle] Klein, J.: ltl2dstar LTL to deterministic Streett and Rabin automata. http://www.ltl2dstar.de/
	- [KLG13] Kˇret´ınsk´y, J., Garza, R.L.: Rabinizer 2: Small Deterministic Automata for LTL\GU. In: Van Hung, D., Ogawa, M. (eds.) ATVA 2013. LNCS, vol. 8172, pp. 446–450. Springer, Cham (2013). https://doi.org/10.1007/ 978-3-319-02444-8 32
	- [KV15] Kini, D., Viswanathan, M.: Limit deterministic and probabilistic automata for LTL\GU. In: Baier, C., Tinelli, C. (eds.) TACAS 2015. LNCS, vol. 9035, pp. 628–642. Springer, Heidelberg (2015). https://doi. org/10.1007/978-3-662-46681-0 57
	- [KV17] Kini, D., Viswanathan, M.: Optimal translation of LTL to limit deterministic automata. In: Legay, A., Margaria, T. (eds.) TACAS 2017. LNCS, vol. 10206, pp. 113–129. Springer, Heidelberg (2017). https://doi.org/10. 1007/978-3-662-54580-5 7
	- [ML16] Meyer, P.J., Luttenberger, M.: Solving mean-payoff games on the GPU. In: Artho, C., Legay, A., Peled, D. (eds.) ATVA 2016. LNCS, vol. 9938, pp. 262–267. Springer, Cham (2016). https://doi.org/10.1007/978-3-319- 46520-3 17
	- [MS17] M¨uller, D., Sickert, S.: LTL to deterministic Emerson-Lei automata. In: GandALF, pp. 180–194 (2017)
	- [Pit06] Piterman, N.: From nondeterministic B¨uchi and Streett automata to deterministic parity automata. In: LICS, pp. 255–264 (2006)
	- [SK16] Sickert, S., Kˇret´ınsk´y, J.: MoChiBA: probabilistic LTL model checking using limit-deterministic B¨uchi automata. In: Artho, C., Legay, A., Peled, D. (eds.) ATVA 2016. LNCS, vol. 9938, pp. 130–137. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46520-3 9
	- [TD14] Tian, C., Duan, Z.: Buchi determinization made tighter. Technical report abs/1404.1436, arXiv.org (2014)
	- [VW86] Vardi, M.Y., Wolper, P.: An automata-theoretic approach to automatic program verification (preliminary report). In: LICS, pp. 332–344 (1986)

**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

## **Strix: Explicit Reactive Synthesis Strikes Back!**

Philipp J. Meyer , Salomon Sickert(B) , and Michael Luttenberger

Technical University of Munich, Munich, Germany *{*meyerphi,sickert,luttenbe*}*@in.tum.de

**Abstract.** Strix is a new tool for reactive LTL synthesis combining a direct translation of LTL formulas into deterministic parity automata (DPA) and an efficient, multi-threaded explicit state solver for parity games. In brief, Strix (1) decomposes the given formula into simpler formulas, (2) translates these on-the-fly into DPAs based on the queries of the parity game solver, (3) composes the DPAs into a parity game, and at the same time already solves the intermediate games using strategy iteration, and (4) finally translates the winning strategy, if it exists, into a Mealy machine or an AIGER circuit with optional minimization using external tools. We experimentally demonstrate the applicability of our approach by a comparison with Party, BoSy, and ltlsynt using the syntcomp2017 benchmarks. In these experiments, our prototype can compete with BoSy and ltlsynt with only Party performing slightly better. In particular, our prototype successfully synthesizes the full and unmodified LTL specification of the AMBA protocol for n = 2 masters.

#### **1 Introduction**

Reactive synthesis refers to the problem of finding for a formal specification of an input-output relation, in our case a *linear temporal logic (LTL)*, a matching implementation [22], e.g. a *Mealy machine* or an *and-inverter-graph (AIG)*. Since the automata-theoretic approach to synthesis involves the construction of a potentially double exponentially sized automaton (in the length of the specification) [13], most existing tools focus on symbolic and bounded methods in order to combat the state-space explosion [5,9,11,18]. A beneficial side effect of these approaches is that they tend to yield succinct implementations.

In contrast to these approaches, we present a prototype implementation of an LTL synthesis tool which follows the automata theoretic approach using parity games as an intermediate step. Strix<sup>1</sup> uses the LTL-to-DPA translation

This work was partially funded and supported by the German Research Foundation (DFG) projects "Game-based Synthesis for Industrial Automation" (253384115) and "Verified Model Checkers" (317422601).

<sup>1</sup> https://strix.model.in.tum.de/

c The Author(s) 2018

H. Chockler and G. Weissenbacher (Eds.): CAV 2018, LNCS 10981, pp. 578–586, 2018. https://doi.org/10.1007/978-3-319-96145-3\_31

presented in [10,23] and the multi-threaded explicit-state parity game solver presented in [14,20]: First, the given formula is decomposed into much simpler requirements, often resulting in a large number of safety and co-safety conditions and only a few requiring B¨uchi or parity acceptance conditions, which is comparable to the approach of [5,21]. These requirements are then translated on-the-fly into automata, keeping the invariant that the parity game solver can easily compose the actual parity game. Further, by querying only for states that are actually required for deciding the winner, the implementation avoids unnecessary work.

The parity game solver is based on the *strategy iteration* of [19] which iteratively improves non-deterministic strategies, i.e. strategies that can allow several actions for a given state as long as they all are guaranteed to lead to the specified system behaviour. When translating the winning strategy into a Mealy automaton or an AIG this non-determinism can be used similarly to "don't cares" when minimizing boolean circuits. Strategy iteration offers us two additional advantages, first, we can directly take advantage of multi-core systems; second, we can reuse the winning strategies which have been computed for the intermediate arenas.

*Related Work and Experimental Evaluation.* From the tools submitted to syntcomp2017, ltlsynt [15] is closest to our approach: it also combines an LTLto-DPA-translation with an explicit-state parity game solver, but it does not intertwine the two steps, instead it uses a different approach for the translation leading to one monolithic DPA which is then turned in a parity game. In contrast, the two best performing tools from syntcomp2017, BoSy and Party, use bounded synthesis, by reduction either to SAT, SMT, or safety games.

In order to give a realistic estimation of how our tool would have faired at syntcomp2017 (TLSF/LTL track), we tried to re-create the benchmark environment of syntcomp2017 as close as possible on our hardware: in its current state, our tool would have been ranked below Party, but before ltlsynt and BoSy. Due to time and resource constraints, we could only do an in-depth comparison with the current version of ltlsynt; in particular we used the TLSF specification of the complete<sup>2</sup> AMBA protocol for *n* = 2 as a benchmark. We refer to Sect. 3 for details on the benchmarking procedure.

#### **2 Design and Implementation**

Strix is implemented in Java and C++. It supports LTL and TLSF [16] (only the reduced *basic* variant) as input languages, while the latter one is preferred, since it contains more information about the specification. We describe the main steps of the tool in the following paragraphs with examples given in Fig. 1.

<sup>2</sup> i.e. no decomposition in masters and clients or structural properties were used.

*Splitting and Translation.* As a preprocessing step the specification is split into syntactic (co)safety and (co)B¨uchi formulas, and one remaining general LTL formula. These are then translated into the simplest deterministic automaton class using the constructions of [10,23]. To speed up the process these automata are constructed on-the-fly, i.e., states are created only if requested by later stages. Furthermore, since DPAs can be easily complemented, the implementation translates the formula and its negation and chooses the faster obtained one.

**Fig. 1.** Synthesis of a simple arbiter with two clients. Here, a winning strategy is already obtained on the partial arena: always take any of the non-dashed edges.

*Arena Construction.* Here we construct one product automaton and combine the various acceptance conditions into a single parity acceptance condition: for this, we use the idea underlying the last-appearance-record construction, known from the translation of Muller to parity games, to directly obtain a parity game again.

*Parity Game Solving.* The parity game solver runs in parallel to the arena construction on the partially constructed game in order to guide the translation process, with the possibility for early termination when a winning strategy for the system player is found. It uses strategy iteration that supports non-deterministic strategies [19] from which we can benefit in several ways: First, in the translation process, the current strategy stays valid when adding nodes to the arena and thus can be used as initial strategy when solving the extended arena. Second, the non-deterministic strategies allow us to later heuristically select actions of the strategy that minimize the generated controller and to identify irrelevant output signals (similar to "don't care"-cells in Karnaugh maps). Finally, the strategy iteration can easily take advantage of multi-core architectures [14,20].

*Controller Generation and Minimization.* From the non-deterministic strategy we obtain an incompletely specified Mealy machine and optionally pass it to the external SAT-based minimizer MeMin [1] for Mealy machines and extract a more compact description.

*AIGER Circuit Generation and Minimization.* We translate the minimized Mealy machine with the tool Speculoos<sup>3</sup> into an AIGER circuit. In parallel, we also construct an AIGER circuit out of the non-minimized Mealy machine, since this can sometimes result in smaller circuits. The two AIGER circuits are then further compressed using ABC [6], and the smaller one is returned.

#### **3 Experimental Evaluation**

We evaluate Strix on the TLFS/LTL-track benchmark of the syntcomp2017 competition, which consists of 177 realizable and 67 unrealizable temporal logic synthesis specifications [15]. The experiment was run on a server with an Intel E5-2630 v4 clocked at 2.2 GHz (boost disabled). To mimic syntcomp2017 we imposed a limit of 8 threads for parallelization, a memory limit of 32 GB and a timeout of one hour for each specification. Every specification for that a tool correctly decides realizability within these limits is counted as solved for the category **Realizability**, and every specification for that it can additionally produce an AIGER circuit that is successfully verified is counted as solved for the category **Synthesis**. For this we verified the circuits with an additional time limit of one hour using the nuXmv model checker [7] with the check ltlspec and check ltlspec klive routines in parallel.

We compared Strix with ltlsynt in the latest available release (version 2.5) at time of writing. This version differs from the one used during syntcomp2017 as it contains several improvements, but also performs worse in a few cases and exhibits erroneous behaviour: for **Realizability**, it produced one wrong answer, and for **Synthesis**, it failed in 72 cases to produce AIGER circuits due to a program error.

Additionally, we compare our results with the best configuration of the top tools competing in syntcomp2017: Party (portfolio), ltlsynt and BoSy (spot). Due to the difficulty of recreating the syntcomp2017 hardware setup<sup>4</sup>, we compiled the results for these tools in Table 1 from the syntcomp2017 webpage<sup>5</sup> combining them with our results.

<sup>3</sup> https://github.com/romainbrenguier/Speculoos

<sup>4</sup> syntcomp2017 was run on an Intel E3-1271 v3 (4 cores/8 threads) at 3.6 GHz with 32 GB of RAM available for the tools. As stated above, we imposed the same constraints regarding timeout, maximal number of threads, and memory limit; but the Intel E3-1271 v3 runs at 3.6 GHz (with boost 4.0 GHz), while the Intel E5-2630 v4 used by us runs at only 2.2 GHz (boost disabled) resulting in a lower per-threadperformance (potentially 30% slower); on the other hand our system has a larger cache and a theoretically much higher memory bandwidth from up to 68.3 GB/s compared to 25.6 GB/s (for random reads, as in the case of dynamically generated parity games, these numbers are much closer). It seems therefore likely that for some benchmark-tool combinations our system is faster while for others it is slower.

<sup>5</sup> http://syntcomp.cs.uni-saarland.de/syntcomp2017/experiments/

The **Quality** rating compares the size of the solutions according to the syntcomp2017 formula, where a tool gets 2 *<sup>−</sup>* log<sup>10</sup> *<sup>n</sup>*+1 *<sup>r</sup>*+1 quality points for each verified solution of size *n* for a specification with reference size *r*. We now move on to a detailed discussion of the results and their interpretation.

**Table 1.** Results for Strix compared with ltlsynt and selected results from syntcomp2017 on the TLSF/LTL-track benchmark and on noteable instances. We mark timeouts by time, memouts by mem, and errors by err.


*Realizability.* We were able to correctly decide realizability for 163 and unrealizability for 51 specifications, resulting in 214 solved instances. We solve five instances that were previously unsolved in syntcomp2017.

*Synthesis.* We produced AIGER circuits for 148 of the realizable specifications. In 15 cases, we only constructed a Mealy machine, but the subsequent steps (MeMin for minimization or Speculoos for circuit generation) reached the time or memory limit. We were able to verify correctness for 146 of the circuits, reaching the model checking time limit in two case. Together with the 51 specifications for which we determined unrealizability, this results in 197 solved instances.

*Quality.* We produced 36 solutions that are smaller than any solution during syntcomp2017. The most significant reductions are for the AMBA encoder and the full arbiter, with reductions of over 75%, and for ltl2dba E 4 and ltl2dba E 6, where we produce indeed the smallest implementation there is.

#### **3.1 Effects of Minimization**

We could reduce the size of the Mealy machine in 80 cases, and on average by 45%. However the data showed that this did not always reduce the size of the generated AIGER circuit: in 13 cases (most notably for several arbiter specifications) the size of the circuit generated from the Mealy machine actually increased when applying minimization (on average by 190%), while it decreased in 62 cases (on average by 55%).

We conjecture that the structure of the product-arena is sometimes amenable to compact representation in an AIGER circuit, while after the (SAT-based) minimization this is lost. In these cases the SAT/SMT-based bounded synthesis tools such as BoSy and Party also have difficulties producing a small solution, if any at all.

#### **3.2 Synthesis of Complete AMBA AHB Arbiter**

To test maturity and scalability of our tool, we synthesized the AMBA AHB arbiter [2], a common case study for reactive synthesis. We used the parameterized specification from [17] for *n* = 2 masters, which was also part of SYNT-COMP2016, but was left unsolved by any tool. With a memory limit of 128 GB, we could decide realizability within 26 min and produce a Mealy machine with 83 states after minimization. While specialised GR(1) solvers [2,4,12] or decompositional approaches [3] are able to synthesize the specification in a matter of minutes, to the best of our knowledge we are the first full LTL synthesis tool that can handle the complete non-decomposed specification in a reasonable amount of time. For comparison, ltlsynt (2.5) needs more than 2.5 days on our system and produces a Mealy machine with 340 states.

#### **3.3 Discussion**

The ltlsynt tool is part of Spot [8], which uses a Safra-style determinization procedure for NBAs. Conceptually, it also uses DPAs and a parity game solver as a decision procedure. However, as shown in [10] the produced automata tend to be larger compared to our translation, which probably results in the lower quality score. Our approach has similar performance and scales better on certain cases. The instances where ltlsynt performs better than Strix are specifications that we cannot split efficiently and the DPA construction becomes the bottleneck.

Bounded synthesis approaches (BoSy*,* Party) tend to produce smaller Mealy machines and to be able to handle larger alphabets. However, they fail when the minimal machine implementing the desired property is large, even if there is a compact implementation as a circuit. In our approach, we can often solve these cases and still regain compactness of the implementation through minimization afterwards. The strength of the Party portfolio is the combination of traditional bounded synthesis and a novel approach by reduction to safety games, which results in a large number of solved instances, but reduces the avg. quality score.

**Future Work.** Strix combines Java (LTL simplification and automata translations) and C++ (parity game construction and solving). We believe that a pure C++ implementation will further improve the overall runtime and reduce the memory footprint. Next, there are several algorithmic questions we want to investigate going forward, especially expanding parallelization of the tool. Furthermore, we want to reduce the dependency on external tools for circuit generation in order to be able to fine-tune this step better. Especially replacing Speculoos is important, since it turned out that it was unable to handle complex transition systems.

#### **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

### **BTOR2 , BtorMC and Boolector 3.0**

Aina Niemetz1,2(B) , Mathias Preiner1,2 , Clifford Wolf<sup>3</sup>, and Armin Biere<sup>1</sup>

 Johannes Kepler University Linz, Linz, Austria Stanford University, Stanford, USA niemetz@cs.stanford.edu Symbiotic EDA, Vienna, Austria

**Abstract.** We describe Btor2, a word-level model checking format for capturing models of hardware and potentially software in a bit-precise manner. This simple, line-based and easy to parse format can be seen as a sorted extension of the word-level format Btor. It uses design principles from the bit-level format Aiger and follows semantics of the Smt-Lib logics of bit-vectors with arrays. This intermediate format can be used in various verification flows and is perfectly suited to establish a word-level model checking competition. It is supported by our new open source model checker BtorMC, which is built on top of version 3.0 of our SMT solver Boolector. We further provide new word-level benchmarks on which these open source tools are evaluated.

Our format Btor2 generalizes and extends the Btor [5] format, which can be seen as a word-level generalization of the initial version of the bit-level format Aiger [2]. Btor is a format for quantifier-free formulas over bit-vectors and arrays with Smt-Lib [1] semantics but also provides sequential extensions for specifying word-level model checking problems with registers and memories. In contrast to Btor, which is tailored towards bit-vectors and one-dimensional bitvector arrays, Btor2 has explicit sort declarations. It further allows to explicitly initialize registers and memories (instead of implicit initialization in Btor) and extends the set of sequential features with witnesses, invariant and fairness constraints, and liveness properties. All of these are word-level variants lifted from corresponding features in the latest Aiger format [4], the input format of the hardware model checking competition (HWMCC) [3,6] since 2011. We provide an open source Btor2 tool suite, which includes a generic parser, random simulator and witness checker. We further implemented a reference bounded model checker BtorMC on top of our SMT solver Boolector. We consider Btor2 as an ideal candidate to establish a word-level hardware model checking competition.

#### **1 Format Description**

The syntax of Btor2 is shown in Fig. 1. The sort keyword is used to define arbitrary bit-vector and array sorts. This not only allows to specify multi-dimensional

Supported by Austrian Science Fund (FWF) under NFN Grant S11408-N23 (RiSE). c The Author(s) 2018

<sup>-</sup>H. Chockler and G. Weissenbacher (Eds.): CAV 2018, LNCS 10981, pp. 587–595, 2018. https://doi.org/10.1007/978-3-319-96145-3\_32

**Fig. 1.** Syntax of Btor2. Non-terminals opidx and op are indexed and non-indexed operators as defined in Table 1 (sequential part in red). (Color figure online)

arrays but can be extended to support (uninterpreted) functions, floating points and other sorts. As a consequence, Btor2 is not backwards compatible with Btor. For clarity, in Fig. <sup>1</sup> we distinguish between node (line) identifiers nid and sort identifiers sid, and do not allow an identifier to occur in both sets. Introducing sorts renders type specific keywords such as var, array and acond from Btor obsolete. Instead, Btor2 uses the keyword input to declare bit-vector and array variables of a given sort. Bit-vector constants are created as in Btor with the keywords const[dh], one, ones and zero.

Bit-vector and array operators as supported by Btor2 and their respective sorts are shown in Table 1. We use <sup>B</sup>*<sup>n</sup>* for a bit-vector sort of width *<sup>n</sup>*, and <sup>I</sup> and E for the index and element sorts of an array sort AI→E. Note that some bit-vector operators can be interpreted as *signed* or *unsigned*. In signed context, as in Smt-Lib, bit-vectors are represented in two's complement.

#### **2 Sequential Extension**

As shown in Fig. 1, the sequential extension of Btor2 introduces a state keyword, which allows to specify registers and memories. In contrast to Btor, where registers are implicitly zero-initialized and memories are uninitialized, Btor2 provides a keyword init to explicitly define initialization functions for states. This enables us to also model partial initialization. For example, initializing a memory with a bit-vector constant zero, zero-initializes the whole memory, whereas


**Table 1.** Operators supported by Btor2, where <sup>B</sup>*<sup>n</sup>* represents a bit-vector sort of size *n* and AI→E represents an array sort with index sort I and element sort E.

partially initializing a register can be achieved by applying a bit-mask to an uninitialized register.

Transition functions for both registers and memories are defined with the next keyword. It takes the current and next states as arguments. A state variable without associated next function is treated as a *primary* input, i.e., it has the same behaviour as inputs defined via keyword input. Note that Btor provides a next keyword for registers and an anext keyword for memories. Using sorts in Btor2 avoids such sort specific keyword variants.

As in the latest version of Aiger [4], Btor2 supports bad state properties, which are essentially negations of safety properties. Multiple properties can be specified by simply adding multiple bad state properties. Invariant constraints can be introduced via the constraint keyword and are assumed to hold globally. A witness for a bad state property is an initialized finite path, which reaches (actually, contains) a bad state and satisfies all invariant constraints.

Again as in Aiger [4], keywords fair and justice allow to specify (global) fairness constraints and (negations of) liveness properties. Each *justice* property consists of a set of B¨uchi conditions. A witness for a justice property is an infinite initialized path on which all B¨uchi conditions and all global fairness constraints are satisfied infinitely often. In addition, all global invariant constraints have to hold. The justice keyword takes a number (the number of B¨uchi conditions) and an arbitrary number of nodes (the B¨uchi conditions) as arguments.

#### **3 Witness Format**

The syntax of the Btor2 witness format is shown in Fig. 2. A Btor2 witness consists of a sequence of valid input assignments grouped by (time) frames. It starts with 'sat' followed by a list of properties that are satisfied by the witness. A property is identified by a prefix 'b' (for **b**ad) and 'j' (for **j**ustice) followed by a number *i*, which ranges over the number of defined *bad* and *justice* properties starting from 0. For example, 'b0 j0' refers to the first bad and first justice property in the order as they occur in the Btor2 input. The list of properties is followed by a sequence of *k* + 1 frames at time *t* ∈ {0*,...,k*}. A *frame* is divided into a state and input part. The *state* part starts with '#*t*' and is mandatory for the first frame (*t* = 0) and optional for later frames (*t >* 0). It contains state assignments at time *t*. The *input* part starts with '@*t*' and consists of input assignments of the transition from time *t* to *t* + 1. If states are uninitialized (no init), their initial assignment is required to be specified in frame '#0'. The state part is usually omitted for *t >* 0 since state assignments can be computed from states and inputs at time *t* − 1. While don't care inputs can be omitted, our witness checker assumes that they are zero. Input and state assignments use the same numbering scheme as properties, i.e., states and inputs are numbered separately in the order they are defined, starting from 0. For example, 0 in frame '#*t*' (or '@*t*') refers to the first state (or input) as defined in the Btor2 input. For justice properties we assume the witness to be lasso shaped, i.e., the next state, which can be computed from the last state and inputs at time *k*, is identical to one of the previous states at time *t* = 0 *...k*. As in Aiger, a Btor2 witness is terminated with '.' on a separate line.


**Fig. 2.** Btor2 model and witness format syntax (sequential part in red). (Color figure online)

Figure 3 illustrates a simple C program (left), the corresponding Btor2 model with the negation of the assertion as a bad property (center), and a

**Fig. 3.** Example C program with corresponding Btor2 model and witness.

Btor2 witness for the violated property (right). The Btor2 model defines one bad property (a == 3 && b == 3), which is satisfied in frame 6. The corresponding witness identifies this property as bad property 'b0' (first bad property defined in the model). All states are initialized, hence '#0' is empty, and '@0' to '@6' indicate the assignments of input 0 (turn, the first input defined in the model) in frames 0 to 6, e.g., turn = 1 at *t* = 0, turn = 0 at *t* = 1 and so on. In frame 6, both states a and b reach value 3, and therefore property 'b0' is satisfied.

#### **4 Tools**

We provide a generic stand-alone parser for Btor2, which features basic type checking and consists of approx. 1,500 lines of C code. We implemented a reference bounded model checker BtorMC, which currently supports checking safety (aka. bad state) properties for models with registers and memories and produces witnesses for satisfiable properties. Unrolling the model is performed by symbolic simulation, i.e., symbolic substitution of current state expressions into next state functions, and incremental SMT solving. We also implemented a simulator for randomly simulating Btor2 models. It further supports checking Btor2 witnesses. The model checker is tightly integrated into our SMT solver Boolector [18], an award-winning SMT solver for the theory of fixed-size bit-vectors with arrays and uninterpreted functions. Since the last major version [18], we extended Boolector with several new features. Most notably, Boolector 3.0 now comes with support for quantified bit-vectors [24] and two different local search strategies for quantifier-free bit-vector formulas that don't rely on but can be combined with bit-blasting [19,21,22]. It further provides support for Btor2. In contrast to previous versions of Boolector, Boolector 3.0 and all Btor2 tools are released under the MIT open source license and the source code is hosted on GitHub<sup>1</sup>.

#### **5 Experiments**

We collected ten real-world (System)Verilog designs with safety properties from various open source projects [11,26–28]. The majority of these designs include memories. We used the open synthesis suite Yosys [29] to synthesize these designs into Btor2 and Smt-Lib. For Btor2, Yosys directly generates the models from a circuit description. For Smt-Lib, since the language does not support describing model checking problems, we used Yosys in combination with Yosys-SMTBMC to produce unrolled (incremental) problems.

We compared BtorMC against the most recent versions of Boolector (3.0) and Yices [10] (2.5.4), the two best solvers of the QF ABV division of the SMT competition 2017. The Btor2 models serve as input for BtorMC, and the incremental Smt-Lib benchmarks serve as input for Boolector and Yices. All benchmarks, synthesis scripts, generated files, log files and the source code of our tools for this evaluation are available at http://fmv.jku.at/cav18-btor2.

The results in Table 2 show that our flow using Btor2 as intermediate format is competetive with simple unrolling. Note that our model checker BtorMC issues incremental calls to Boolector. However, in Boolector, sophisticated wordlevel rewriting is currently disabled in incremental mode. We expect a major performance boost by fully supporting incremental word-level preprocessing.


**Table 2.** BtorMC/Btor2 vs. unrolled Smt-Lib with a time limit of 3600 s, where *k* is the bound and #bad is the number of bad properties.

<sup>1</sup> https://github.com/boolector.

#### **6 Conclusion**

We propose Btor2, a new word-level model-checking and witness format. For this format we provide a generic parser implementation, a simulator that also checks witnesses, and a reference bounded model checker BtorMC, which is tightly integrated with our SMT solver Boolector. These open source tools are evaluated on new real-world benchmarks, which we synthesized from open source hardware (System) Verilog models into Btor2 and Smt-Lib with Yosys. The tool Verilog2SMV [14] translates Verilog into model-checking problems in several formats, including nuXmv [7] and Btor. However, its translation to Btor is incomplete and development discontinued.

We plan to provide a translator from Btor2 into SALLY [25], and VMT [8], which are both extensions of Smt-Lib to model symbolic transition systems. It might also be interesting to translate incremental Smt-Lib benchmarks and horn clause models (as handled by, e.g., *µZ* [13]) into Btor2 and vice versa. We hope other compilers and model checkers such as SAL [9], EBMC [15] and ABC [12,16] will provide support to produce and read Btor2 models. We want to extend the format to other logics, in particular to support lambdas as in [23]. There is also a need for fuzzing [20] and delta-debugging tools [17].

Last but not least, we want to use this format to bootstrap a word-level model checking competition, which of course needs more benchmarks.

#### **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

### **Nagini: A Static Verifier for Python**

Marco Eilers(B) and Peter M¨uller

**Abstract.** We present Nagini, an automated, modular verifier for statically-typed, concurrent Python 3 programs, built on the Viper verification infrastructure. Combining established concepts with new ideas, Nagini can verify memory safety, functional properties, termination, deadlock freedom, and input/output behavior. Our experiments show that Nagini is able to verify non-trivial properties of real-world Python code.

#### **1 Introduction**

Dynamic languages have become widely used because of their expressiveness and ease of use. The Python language in particular is popular in domains like teaching, prototyping, and more recently data science. Python's lack of safety guarantees can be problematic when, as is increasingly the case, it is used for critical applications with high correctness demands. The Python community has reacted to this trend by integrating type annotations and optional static type checking into the language [20]. However, there is currently virtually no tool support for reasoning about Python programs beyond type safety.

We present Nagini, a sound verifier for statically-typed, concurrent Python programs. Nagini can prove memory safety, data race freedom, and user-supplied assertions. Nagini performs *modular* verification, which is important for verification to scale and to be able to verify libraries, and *automates* the verification process for programs annotated with specifications.

Nagini builds on many techniques established in existing tools: (1) Like Veri-Fast [10] and other tools [4,19,22], it uses separation logic style permissions [16] in order to locally reason about concurrent programs. (2) Like .NET Code Contracts [7], it uses a contract library to enable users to write code-level specifications. (3) Like many verification tools [2,6,11,13], it verifies programs by encoding the program and its specification into an intermediate verification language [1,8], namely Viper [14], for which automatic verifiers already exist.

Nagini combines these techniques with new ideas in order to verify advanced properties and handle the dynamic aspects of Python. In particular, Nagini implements a comprehensive system for verifying finite blocking [5] and input/output behavior [18], and builds on Mypy [12] to verify safety while also supporting important dynamic language features. Nagini is intended for verifying substantial, real-world code, and is currently used to verify the Python

c The Author(s) 2018

H. Chockler and G. Weissenbacher (Eds.): CAV 2018, LNCS 10981, pp. 596–603, 2018. https://doi.org/10.1007/978-3-319-96145-3\_33

implementation of the SCION internet architecture [3]. To our knowledge, it is the first tool to enable automatic verification of Python code. Existing tools for JavaScript [21,24] also target a dynamic language, but focus on faithfully modeling JavaScript's complex semantics rather than practical verification of high-level properties.

Due to its wide range of verifiable properties, Nagini has applications in many domains: In addition to memory safety, programmers can choose to prove that a server implementation will stay responsive, that data science code has desired functional properties, or that algorithms terminate and preserve certain invariants, for example in a teaching context. Nagini is open-source and available online<sup>1</sup>, and can be used from the popular PyCharm IDE via a prototype plugin.

In this paper, we describe Nagini's supported Python subset and specification language, give an overview of its implementation and the encoding from Python to Viper, and provide an experimental evaluation of Nagini on real-world code.

#### **2 Language and Specifications**

*Python Subset:* Nagini requires input programs to comply to the static, nominal type system defined in PEP 484 [20] as implemented in the Mypy type checker [12], which requires type annotations for function parameters and return types, but can normally infer types of local variables. Nagini fully supports the non-gradual part of Mypy's type system, including generics and union types.

The Python subset accepted by Mypy and Nagini can accommodate most real Python programs, potentially via some workarounds like using union types instead of structural typing. While our subset is statically typed, it includes many features and potential pitfalls not found in static languages, such as dynamic addition and removal fields from objects. Some other features like reflection and dynamic code generation are not supported.

Where compromises are necessary, Nagini aims for modularity, performance, and completeness for features typically found in user code over general support for all language features. As an example, Nagini works with a simplified model of Python's object attribute lookup behavior: A simple attribute access in Python leads to the invocation of several "magic" methods, which, if modelled correctly, would result in an overhead that would likely make automatic verification intractable. Nagini exploits the fact that these methods are mostly used to implement decorators, metaclasses, and system libraries, but rarely in user code. It assumes the default behavior of those methods, and implements direct support for frequently-used decorators and metaclasses that change their behavior. Importantly, Nagini flags an error if verified programs override these methods or are otherwise outside the supported subset, and is therefore sound.

*Specification Language:* Nagini includes a library of specification functions similar to .NET Code Contracts [7] to express pre- and postconditions, loop invariants, and other assertions. Calls to these functions are interpreted as specifications by Nagini, but can be automatically removed before execution. Users can

<sup>1</sup> https://github.com/marcoeilers/nagini.

**Fig. 1.** Example program demonstrating Nagini's specification language. Contract functions are highlighted in italics. Note that functional specifications and postconditions are largely omitted to highlight the different specification constructs.

annotate Mypy-style type stub files for external libraries with specifications; the program will then be verified assuming they are correct. A detailed explanation of the specification language can be found in Nagini's Wiki<sup>2</sup>.

An example of an annotated program is shown in Fig. 1. The first two lines import the contract library and Python's library for type annotations. Preand postconditions are declared via calls to the contract functions Requires and Ensures in lines 17 and 10, respectively. The arguments of these functions are interpreted as assertions, which can be side-effect free boolean Python expressions or calls to other contract functions. Similarly, loops must be annotated with invariants (line 22), and special *exceptional* postconditions specify which exceptions a method may raise, and what postconditions must hold in this case. The Exsures annotation in line 18 states that a SoldoutException may be raised and makes no guarantees in this case. The invariant MustTerminate in line 25 specifies that the loop terminates; the argument represents a ranking function [5].

Like the underlying Viper language, Nagini uses Implicit Dynamic Frames (IDF) [23], a variation of separation logic [16], to achieve framing and allow local reasoning in the presence of concurrency. IDF establishes a system of *permissions* for heap locations that roughly corresponds to separation logic's points-to predicates. Methods may only read or write heap locations they currently hold a permission for, and can specify which permissions they require from and give

<sup>2</sup> https://github.com/marcoeilers/nagini/wiki.

back to their caller in their pre- and postconditions. Since there is only ever a single permission per heap location, holding a permission guarantees that neither other threads nor called methods can modify the respective location.

In Nagini, a permission is created when a field is assigned to for the first time; e.g., when executing line 9, the init method will have permission to three fields. Permission assertions are expressed using the Acc function (line 14). Assertions can be abstracted over using predicates [17], declared in Nagini by using annotated functions (line 12). In the example, the constructor of Ticket bundles all available permissions in the predicate state using the ghost statement Fold in line 9 and subsequently returns this predicate to its caller via its postcondition.

In addition, Nagini offers a second kind of permission that allows *creating* a field that does not currently exist, but cannot be used for reading (since that would cause a runtime error). Constructors implicitly get this kind of permission for every field mentioned in a class; in the example, such a permissions is returned to the caller (line 10) and used in line 28. The loop invariant contains the permission to modify the res list using one of several built-in predicates for Python's standard data types (line 22) as well as permissions to the fields of all objects in the list (line 23). This kind of *quantified permission* [15], corresponding to separation logic's iterated separating conjunction, is one of two supported ways to express permissions over unbounded numbers of heap locations.

Other contract functions allow specifying, e.g., I/O behavior, and some have variations for advanced users, e.g., the Forall function can take trigger expressions to specify when the underlying SMT solver should instantiate the quantifier.

*Verified properties:* Nagini verifies some safety properties by default: Verified programs will not raise runtime errors or undeclared exceptions. The permission system guarantees that verified code is memory safe and free of data races. Nagini also verifies some properties that Mypy only checks optimistically, e.g., that referenced names are defined before they are used. As an example, if the Ticket class were defined after the order tickets function, Nagini would not allow calls to the function *before* the class definition, because of the call in line 26.

Beyond this, Nagini can verify (1) functional properties, (2) input/output properties, i.e., which I/O operations may or must occur, using a generalization of the method by Penninckx et al. [18], and (3) finite blocking [5], i.e., that no thread blocks indefinitely when trying to acquire a lock or join another thread, which includes deadlock freedom and termination. Verification is modular in the sense that adding code to a program only requires verifying the added parts; any code that verified before is guaranteed to still verify. Top level statements are an exception and have to be reverified when any part of the program changes, since Python's import mechanism is inherently non-modular.

#### **3 Implementation**

Nagini's verification workflow is depicted in Fig. 2. After parsing, Nagini invokes the Mypy type checker on the input and rejects the program if errors are found.

**Fig. 2.** Nagini verification workflow.

It then analyzes the input program and extracts structural information into an internal model, which is then encoded into a Viper program. The program is verified using one of the two Viper backends, based on either symbolic execution (SE) or verification condition generation (VCG), respectively. Any resulting Viper-level error messages are mapped back to a Python-level error.

*Encoding:* Nagini encodes Python programs into Viper programs that verify only if the original program was correct. At the top level, Viper programs consist of *methods*, whose bodies contain imperative code, side-effect free *functions*, and the aforementioned *predicates*, as well as *domains*, which can be used to declare and axiomatize custom data types. The structure of a created Viper program roughly follows the structure of the Python program: Each function in the Python program corresponds to either a method, a function, or a predicate in the Viper program, depending on its annotation. Additional Viper methods are generated to check proof obligations like behavioral subtyping and to model the execution of all top level statements.

Nagini maintains various kinds of ghost state, e.g., for verifying finite blocking and to represent which names are currently defined. It models Python's type system using a Viper domain axiomatized to reflect subtype relations. Nagini desugars complex Python language constructs into simple ones that exist in Viper, but subtle language differences often require additional effort in the encoding. As an example, Viper distinguishes references from primitive values whereas Python does not, requiring boxing and unboxing operations in the encoding.

*Tool interaction:* Nagini is invoked on an annotated Python file, and verifies this file and all (transitive) imports without user interaction. It then outputs either a success message or Python-level error messages that indicate type or verification errors, use of unsupported features, or invalid specifications, along with the source location. As an example, removing the Fold statement in line 9 of Fig. 1 yields the error message "Postcondition of init might not hold. There might be insufficient permission to access self.state(). (example.py@10.16)".

#### **4 Evaluation**

In addition to having a comprehensive test suite of over 12,500 lines of code, we have evaluated Nagini on a set of examples containing (parts of) implemen-


**Fig. 3.** Experiments. For each example, we list the lines of code (excluding whitespace and comments), the number of those lines that are used for specifications, the length of the resulting Viper program, properties (SF = safety, FC = functional correctness, FB = finite blocking, IO = input/output behavior) that could be verified (✓), could not be verified (✗) or were not attempted (-), and the verification times with Viper's SE backend, sequential and parallelized, in seconds.

tations of standard algorithms from the internet<sup>3</sup>, the example from Fig. 1, a class from the SCION implementation, as well as examples from other verifiers translated to Python. Figure 3 shows the examples and which properties were verified; the functional property we proved for the binary search tree implementation is that it maintains a sorted tree. The examples cover language features like inheritance (example 10), comprehensions (3), dynamic field addition (6), operator overloading (3), union types (4), threads and locks (9), as well as specification constructs like quantified permissions (6) and predicate families (10). Nagini correctly finds an error in the SCION example and successfully verifies all other examples.

The runtimes shown in Fig. 3 were measured by averaging over ten runs on a Lenovo Thinkpad T450s running Ubuntu 16.04, Python 3.5 and OpenJDK 8 on a warmed-up JVM. They show that Nagini can effectively verify non-trivial properties of real-life Python programs in reasonable time. Due to modular verification, parts of a program can be verified independently and in parallel (which Nagini does by default), so that larger programs will not inherently lead to performance problems. This is demonstrated by the speedup achieved via parallelization on the two larger examples; for the smaller ones, verification time is dominated by a single complex method. Additionally, the annotation overhead is well within the range of other verification tools [9].

**Acknowledgements.** Thanks to Vytautas Astrauskas, Samuel Hitz, and F´abio Pakk Selmi-Dei for their contributions to Nagini. We gratefully acknowledge support from the Zurich Information Security and Privacy Center (ZISC).

<sup>3</sup> We chose examples that do not make use of dynamic features or external libraries from rosettacode.org, interactivepython.org and github.com/keon/algorithms.

#### **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

## Peregrine**: A Tool for the Analysis of Population Protocols**

Michael Blondin , Javier Esparza , and Stefan Jaax(B)

Technische Universit¨at M¨unchen, Munich, Germany {blondimi,esparza,jaax}@in.tum.de

**Abstract.** We introduce Peregrine, the first tool for the analysis and parameterized verification of population protocols. Population protocols are a model of computation very much studied by the distributed computing community, in which mobile anonymous agents interact stochastically to achieve a common task. Peregrine allows users to design protocols, to simulate them both manually and automatically, to gather statistics of properties such as convergence speed, and to verify correctness automatically. This paper describes the features of Peregrine and their implementation.

**Keywords:** Population protocols · Distributed computing Parameterized verification · Simulation

#### **1 Introduction**

Population protocols [1,3,4] are a model of distributed computing in which replicated, mobile agents with limited computational power interact stochastically to achieve a common task. They provide a simple and elegant formalism to model, e.g., networks of passively mobile sensors [1,5], trust propagation [13], evolutionary dynamics [14], and chemical systems, under the name chemical reaction networks [12,16,19].

Population protocols are parameterized: the number of agents does not change during the execution of the protocol, but is *a priori* unbounded. A protocol is correct if it behaves correctly for all of its infinitely many initial configurations. For this reason, it is challenging to design correct and efficient protocols.

In this paper we introduce Peregrine<sup>1</sup>, the first tool for the parameterized analysis of population protocols. Peregrine is intended for use by researchers in distributed computing and systems biology. It allows the user to specify protocols either through an editor or as simple scripts, and to analyze them via a

M. Blondin was supported by the Fonds de recherche du Qu´ebec – Nature et technologies (FRQNT).

<sup>1</sup> Peregrine can be found at https://peregrine.model.in.tum.de.

c The Author(s) 2018

H. Chockler and G. Weissenbacher (Eds.): CAV 2018, LNCS 10981, pp. 604–611, 2018. https://doi.org/10.1007/978-3-319-96145-3\_34

graphical interface. The analysis features of Peregrine include manual stepby-step simulation; automatic sampling; statistics generation of average convergence speed; detection of incorrect executions through simulation; and formal verification of correctness. The first four features are supported for all protocols, while verification is supported for silent protocols, a large subclass of protocols [6]. Verification is performed automatically over *all* of the infinitely many initial configurations using the recent approach of [6] for solving the so-called well-specification problem.

*Related Work.* The problem of automatically verifying that a population protocol conforms to its specification for *one fixed initial configuration* has been considered in [10,11,17,20]. In [10], *ad hoc* search algorithms are used. In [11,17], the authors show how to model the problem in the probabilistic model checker Prism, and under certain conditions in Spin. In [20], the problem is modeled with the Pat toolkit for model checking under fairness assumptions. All these tools increase our confidence in the correctness of a protocol. However, compared to Peregrine, they are not visual tools, they do not offer simulation capabilities, and they can only verify the correctness of a protocol for a finite number of initial configurations, with typically a small number of agents. Peregrine proves correctness for all of the infinitely many initial configurations, with an arbitrarily large number of agents.

As mentioned in the introduction, population protocols are isomorphic to chemical reaction networks (CRNs), a popular model in natural computing. Cardelli et al. have recently developed model checking techniques and analysis algorithms for *stochastic* CRNs [7–9]. The problems studied therein are incomparable to the parameterized questions addressed by Peregrine.

The verification algorithm of Peregrine is based on [6], where a novel approach for the parameterized verification of silent population protocols has been presented. The command-line tool of [6] only offers support for proving correctness, with no functionality for visualization or simulation. Further, contrary to Peregrine, the tool cannot produce counterexamples when correctness fails.

#### **2 Population Protocols**

We introduce population protocols through a simple example and then briefly formalize the model. We refer the reader to [4] for a more thorough but still intuitive presentation. Suppose anonymous and mobile agents wish to take a majority vote. Intuitively, *anonymous* means that agents have no identity, and *mobile* that agents are "wandering around", and can only interact whenever they bump into each other. In order to vote, all agents conduct the following protocol. Each agent is in one out of four states {Y,N, y, n}. Initially all agents are in the states Y or N, corresponding to how they want to vote (states y, n are auxiliary states). Agents repeatedly interact pairwise according to the following rules:

$$a \colon YN \mapsto yn \qquad b \colon Yn \mapsto Yy \qquad c \colon Ny \mapsto Ny \qquad d \colon yn \mapsto yy$$

For example, if the population initially has two agents of opinion "yes" and one agent of opinion "no", then a possible execution is:

$$\{\underline{Y}, Y, \underline{N}\} \xrightarrow{a} \{y, \underline{Y}, \underline{n}\} \xrightarrow{b} \{y, Y, y\},\tag{1}$$

where e.g. -Y,Y,N denotes the multiset with two agents in state Y and one agent in state N.

The goal of every population protocol is to ensure that the agents eventually reach a lasting consensus, i.e., a multiset in which (1) either all agents are in "yes"-states, or all agents are in "no"-states, and (2) further interactions do not destroy the consensus. On top of this universal specification, each protocol has an individual goal, determining which initial configurations should reach the "yes" and the "no" lasting consensus. In the majority protocol above, the agents should reach a "yes"-consensus iff 50% or more agents vote "yes".

Execution (1) above leads to a lasting "yes"-consensus; further, the consensus is the right one, since 2 out of 3 agents voted "yes". In fact, assuming agents interact uniformly and independently at random, the above protocol is correct: executions almost surely reach a correct lasting consensus.

More formally, a population protocol is a tuple (Q, T, I, O) where Q is a finite set of *states*, <sup>T</sup> <sup>⊆</sup> <sup>Q</sup><sup>2</sup> <sup>×</sup> <sup>Q</sup><sup>2</sup> is a set of *transitions*, <sup>I</sup> <sup>⊆</sup> <sup>Q</sup> are the *initial states* and <sup>O</sup>: <sup>Q</sup> → {0, <sup>1</sup>} is the *output mapping*. A *configuration* is a non-empty multiset over Q, an *initial configuration* is a non-empty multiset over I, and a configuration is *terminal* if it cannot be altered by any transition. A configuration is in a *consensus* if all of its states map to the same output under O.

An *execution* is a finite or infinite sequence C<sup>0</sup> <sup>t</sup><sup>1</sup> −→ <sup>C</sup><sup>1</sup> <sup>t</sup><sup>2</sup> −→··· such that <sup>C</sup>i is obtained from applying transition <sup>t</sup>i to <sup>C</sup>i−<sup>1</sup>. A *fair execution* is either a finite execution that reaches a terminal configuration, or an infinite execution such that if {<sup>i</sup> <sup>∈</sup> <sup>N</sup> : <sup>C</sup>i ∗ −→ <sup>D</sup>} is infinite, then {<sup>i</sup> <sup>∈</sup> <sup>N</sup> : <sup>C</sup>i <sup>=</sup> <sup>D</sup>} is infinite for any configuration D. In other words, fairness ensures that a configuration cannot be avoided forever if it is reachable infinitely often. Fairness is an abstraction of the random interactions occurring within a population. A configuration C is in a *lasting consensus* if every execution from C only leads to configurations of the same consensus.

If for every initial configuration C, all fair executions from C lead to a lasting consensus <sup>ϕ</sup>(C) ∈ {0, <sup>1</sup>}, then we say that the protocol *computes* the predicate ϕ. For example, the above majority protocol with O(Y ) = O(y) = 1 and <sup>O</sup>(N) = <sup>O</sup>(n) = 0 computes the predicate <sup>C</sup>[<sup>Y</sup> ] <sup>≥</sup> <sup>C</sup>[N], where <sup>C</sup>[x] denotes the number of occurrences of state x in C. A protocol does not necessarily compute a predicate. For example, if we alter the majority protocol by removing transition d, then -Y,N <sup>a</sup> −→ y, n is a fair execution, but y, n is not in a consensus. In other words, transition d acts as a tie-breaker which allows to reach the consensus configuration y, y. A protocol that computes a predicate is said to be *well-specified*. It is well-known that well-specified population protocols compute precisely the predicates definable in Presburger arithmetic [3]. On top of different *majority protocols* for the predicate <sup>C</sup>[x] <sup>≥</sup> <sup>C</sup>[y], the literature contains, e.g., different families of so-called *flock-of-birds protocols* for the predicates <sup>C</sup>[x] <sup>≥</sup> <sup>c</sup>, where c is an integer constant, and families of *threshold protocols* for the predicates <sup>a</sup><sup>1</sup> · <sup>C</sup>[x1] + ··· <sup>+</sup> <sup>a</sup>n · <sup>C</sup>[xn] <sup>≥</sup> <sup>c</sup>, where <sup>a</sup>1,...,an, c are integer constants and <sup>x</sup>1,...,xn are initial states.

### **3 Analyzing Population Protocols**

Peregrine is a web tool with a JavaScript frontend and a Haskell backend. The backend makes use of the SMT solver Z3 [15] to test satisfiability of Presburger arithmetic formulas. The user has access to four main features through the graphical frontend. We present these features in the remainder of the section.

**Protocol Description.** Peregrine offers a description language for both single protocols and families of protocols depending on some parameters. Single protocols are described either through a graphical editor or as simple Python scripts. Families of protocols (called parametric protocols) can only be specified as scripts, but Peregrine assists the user by generating a code skeleton.

**Simulation.** Population protocols can be simulated through a graphical player depicted in Fig. 1. The user can pick an initial configuration and simulate the protocol by either manual selection of interactions, or by letting a scheduler pick interactions uniformly at random. The simulator keeps a history of the execution which can be rewound at any time, making it easy to experiment with the different behaviours of a protocol. Configurations can be displayed in two ways: either as explicit populations, as illustrated in Fig. 1, or as bar charts of the states count, more convenient for large populations.

**Fig. 1.** Simulation of the majority protocol from the initial configuration -<sup>5</sup> ·Y, <sup>10</sup> ·N.

**Statistics.** Peregrine can generate statistics from batch simulations. The user provides four parameters: smin, smax, m and n. Peregrine generates n random executions as follows. For each execution, a number s is picked uniformly at random from [smin, smax], and an initial configuration of size s is then picked uniformly at random. Each step of an execution is picked uniformly at random among enabled interactions. If no terminal configuration is reached within m steps, then the simulation halts. In the end, n executions of length at most m are gathered. Peregrine classifies the generated executions according to their consensus, and computes statistics on the convergence speed (see the next two paragraphs). The results can be visualized in different ways, and the raw data can be exported as a JSON file.

*Consensus.* For each random execution, Peregrine checks whether the last configuration of an execution is in a consensus and, if so, whether the consensus corresponds to the expected output of the protocol. Peregrine reports which percentage of the executions reach a consensus, and whether the consensus is correct and/or lasting. In normal mode, Peregrine only classifies an execution as lasting consensus if it ends in a terminal configuration. In the *increased accuracy* mode, if the execution ends in a configuration <sup>C</sup> of consensus <sup>b</sup> ∈ {0, <sup>1</sup>}, then the model checker LoLA [18] is used to determine whether there exists a configuration C such that C <sup>∗</sup> −→ <sup>C</sup> and <sup>C</sup> is not of consensus <sup>b</sup>. If it is not the case, then Peregrine concludes that C is in a lasting consensus. Peregrine plots the percentage of executions in each category as a function of the population size, as illustrated on the left of Fig. 2.

*Average Convergence Speed.* Peregrine also provides statistics on the convergence speed of a protocol. Let C<sup>0</sup> <sup>t</sup><sup>1</sup> −→ <sup>C</sup><sup>1</sup> <sup>t</sup><sup>2</sup> −→ ··· <sup>t</sup>- −→ <sup>C</sup> be an execution such that <sup>C</sup> is in a consensus <sup>b</sup> ∈ {0, <sup>1</sup>}. The *number of steps to convergence* of the execution is defined as 0 if all configurations are of consensus b, and otherwise as <sup>i</sup>+1, where <sup>i</sup> is the largest index such that <sup>C</sup>i is not in consensus <sup>b</sup>. For each population size, Peregrine computes the average number of steps to convergence of all consensus executions of that population size, and plots the information as illustrated on the right of Fig. 2.

**Fig. 2.** Statistics for 5000 random executions of the approximate majority protocol of [2], of length at most 40, from initial configurations of size at most 25. The left plot shows the percentage of executions reaching a consensus (dark green: lasting correct, light green: correct, light red: incorrect, dark red: lasting incorrect) and no consensus (orange). In this example the occurrences of light red are negligible. The right plot shows the average number of steps to convergence. (Color figure online)

**Fig. 3.** Verification of the majority protocol of Sect. <sup>2</sup> without transition <sup>d</sup>: yn -<sup>→</sup> yy.

**Verification.** Peregrine can automatically verify that a population protocol computes a given predicate. Predicates can be specified by the user in quantifier-free Presburger arithmetic extended with the family of predicates {<sup>x</sup> <sup>≡</sup> <sup>y</sup> (mod <sup>c</sup>)}c≥<sup>2</sup>, which is equivalent to Presburger arithmetic. For example, for the majority protocol of Sect. 2, the user simply specifies C[Y] >= C[N].

Peregrine implements the approach of [6] to verify correctness of protocols which are silent. A protocol is said to be *silent* if from every initial configuration, every fair execution leads to a terminal configuration. The majority protocol of Sect. 2 and most existing protocols from the literature are silent [6]. We briefly describe the approach of [6] and how it is integrated into Peregrine.

Suppose we are given a population protocol P and we wish to determine whether it computes a predicate <sup>ϕ</sup>. The procedure first tries to prove that <sup>P</sup> is silent. This is done by verifying a more restricted condition called *layered termination*. Verifying the latter property reduces to testing satisfiability of a Presburger arithmetic formula. If this formula holds, then the protocol is silent, otherwise no conclusion is derived. However, essentially all existing silent protocols satisfy layered termination [6].

Once P is proven to be silent, the procedure attempts to prove that no "bad execution" exists. More precisely, it checks whether there exist configurations C<sup>0</sup> and C<sup>1</sup> such that C<sup>0</sup> ∗ −→ <sup>C</sup>1, <sup>C</sup><sup>0</sup> is initial, <sup>C</sup><sup>1</sup> is terminal, and <sup>C</sup><sup>1</sup> is not in consensus <sup>ϕ</sup>(C0) ∈ {0, <sup>1</sup>}. Since reachability is not definable in Presburger arithmetic, a Presburger-definable over-approximation <sup>∗</sup> <sup>−</sup> of reachability, borrowed from Petri net theory, is used instead. We obtain the following formula Φbad-exec:

$$\exists C\_0, C\_1 \colon C\_0 \xleftarrow{\bullet} C\_1 \land \bigwedge\_{q \notin I} C\_0[q] = 0 \land \bigwedge\_{t \in T} \text{succ}(C\_1, t) \subseteq \{C\_1\} \land \bigvee\_{q \in C\_1} (O(q) = \neg \varphi(C\_0)) \dots$$

If <sup>Φ</sup>bad-exec is unsatisfiable, then <sup>P</sup> is correct. Otherwise, no conclusion is reached, and Φbad-exec is iteratively strengthened by enriching the over-approximation <sup>∗</sup> −. Whenever Φbad-exec is satisfied by (C0, C1), Peregrine calls the model-checker LoLA to test whether C<sup>1</sup> is indeed reachable from C0. If so, then Peregrine reports P to be incorrect, and generates a counter-example execution, which can be replayed or exported as a JSON file (see Fig. 3).

Currently Peregrine can verify protocols with up to a hundred states and a few thousands transitions. The bottleneck is the size of the constraint system. Due to lack of space, we refer the reader to [6] for detailed experimental results.

### **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

### **ADAC: Automated Design of Approximate Circuits**

Milan Ceˇ <sup>ˇ</sup> ska(B) , Jiˇr´ı Maty´aˇs, Vojtech Mrazek, Lukas Sekanina, Zdenek Vasicek, and Tom´aˇs Vojnar

> Faculty of Information Technology, IT4Innovations Centre of Excellence, Brno University of Technology, Brno, Czech Republic ceskam@fit.vutbr.cz

**Abstract.** Approximate circuits with relaxed requirements on functional correctness play an important role in the development of resourceefficient computer systems. Designing approximate circuits is a very complex and time-demanding process trying to find optimal trade-offs between the approximation error and resource savings. In this paper, we present ADAC—a novel framework for automated design of approximate arithmetic circuits. ADAC integrates in a unique way efficient simulation and formal methods for approximate equivalence checking into a search-based circuit optimisation. To make ADAC easily accessible, it is implemented as a module of the ABC tool: a state-of-the-art system for circuit synthesis and verification. Within several hours, ADAC is able to construct high-quality Pareto sets of complex circuits (including even 32-bit multipliers), providing useful trade-offs between the resource consumption and the error that is formally guaranteed. This demonstrates outstanding performance and scalability compared with other existing approaches.

#### **1 Introduction**

In the recent years, reduction of power consumption of computer systems and mobile devices has become one of the biggest challenges in the computer industry. *Approximate computing* has been established as a new research field aiming at reducing system resource demands (and, in particular, power demands) by relaxing the requirement that all computations are always performed correctly. Approximate computing exploits the fact that many applications, including image and multimedia processing, signal processing, data mining, machine learning, neural networks, and scientific computations, are *error resilient*, i.e.

This work was supported by the IT4Innovations excellence in science project No. LQ1602.

c The Author(s) 2018

H. Chockler and G. Weissenbacher (Eds.): CAV 2018, LNCS 10981, pp. 612–620, 2018. https://doi.org/10.1007/978-3-319-96145-3\_35

produce acceptable results even though the underlying computations are performed with a certain error. Therefore, the error can be used as a design metric and traded for chip area, power consumption, or runtime. Chippa et al. [7] claims that almost 80% of runtime is spent in procedures that could be approximated.

Approximate computing can be conducted at different system levels with arithmetic circuit approximation being one of the most popular as such circuits are frequently used in the core computations. In our work, we focus on functional approximation where the original circuit is replaced by a less complex one which exhibits some errors but improves non-functional circuit parameters such as power consumption or chip area. Circuit approximation can be formulated as an optimisation problem where the error and non-functional circuit parameters are conflicting design objectives. Designing complex approximate circuits is a timedemanding and error-prone process. Moreover, its automation is challenging too since the design space including candidate solutions is huge and checking that a candidate solution has the required error is itself a computationally demanding task, especially if formal guarantees on the error have to be ensured.

In this tool paper, we present *ADAC* <sup>1</sup>—a novel framework for automated design of approximate circuits. The framework implements a design loop including (i) a *generator* of candidate solutions employing genetic search algorithms, (ii) an *evaluator* estimating non-functional parameters of a candidate solution, and (iii) a *verifier* checking that the candidate solution does not exceed the permissible error. ADAC is integrated as a new module into the ABC tool—a stateof-the-art and widely used system for circuit synthesis and verification [1]. The framework takes as the inputs:


With these inputs, ADAC searches for an approximate circuit satisfying the error threshold and having the minimal estimated chip area. Previous works [3,14,20, 22] confirmed that the chip area is a good optimization objective as it highly correlates with power consumption, which is a crucial target in approximate computing.

The results of [21] clearly demonstrate that search algorithms based on *Cartesian Genetic Programming* (CGP) [12] are well capable of generating high-quality approximate circuits. For complex circuits, however, a high number of candidate solutions has to be generated and evaluated, which significantly limits the scalability of the design process. Our framework implements several approaches for error evaluation suitable for different error metrics and application domains. They include both *SAT and BDD-based techniques* for

<sup>1</sup> https://github.com/imatyas/ADAC.

approximate equivalence checking providing *formal error guarantees* as well as a *bit-parallel circuit simulation* utilising the computing power of modern processors. We also implement a novel search strategy that drives the search towards *promptly verifiable approximate circuits*, which significantly accelerates the design process in many cases [3]. As such, the framework offers a unique integration of techniques based on simulation, formal reasoning, and evolutionary circuit optimisation. Our extensive experimental evaluation demonstrates that ADAC offers outstanding performance and scalability compared with existing methods and tools and paves a way towards an automated design process of complex provably-correct circuit approximations.

#### **2 Architecture and Implementation**

The ADAC framework has a modular architecture illustrated in Fig. 1.

The setup phase is responsible mainly for preparing a chromosome representation of the golden circuit. The circuit is given in a high-level Verilog format, which is first translated to a gate-level representation using the tool Yosys [25], and then the chromosome representation is obtained using our V2CH script. The setup phase is also responsible for generating a configuration file controlling the main design loop. It is generated from the user inputs and optional parameters for CGP and search strategies.

**Fig. 1.** A scheme of the ADAC architecture.

The design loop consists of three components: (i) a generator of candidate designs, (ii) an evaluator of non-functional parameters of the candidate circuit (currently estimating the chip area), and (iii) a verifier evaluating the candidate error. The chip area and the error form a basis of the *fitness function*, whose value is minimised via our search strategy. In particular, the fitness is infinity if the circuit error exceeds the given threshold, and the chip area otherwise. In the future, we plan to support a more general specification of the fitness. As an additional feature, ADAC can also quantify the difference (in the given metric) between two given circuits.

The real values of non-functional parameters, such as the chip area or the power-delay product (PDP), depend on the target technology, and the synthesis of an optimal implementation of the given circuit using the target technology is highly time-consuming. Therefore, our design loop currently uses the *chip area* as the sole non-functional parameter. The chip area is estimated as the sum of the sizes of the gates of the circuit, which are given as one of the inputs of ADAC. The chip area is typically a good estimate of the power consumption [3, 14,20,22]. The output of ADAC (in the gate-level Verilog format) can be passed to industrial circuit design tools to obtain accurate circuit parameters for the target technology. In our experiments, we report PDP for the 45 nm technology synthesised by the Synopsys Design Compiler [19].

We now briefly describe the candidate circuit generator and three methods for error evaluation that are currently supported in ADAC.

The *candidate circuit generator* is based on CGP where a candidate solution is encoded as a chromosome describing an oriented acyclic graph, given as a 2 dimensional array of 2-input nodes. Every node is numbered and is encoded by 3 integers where the first two numbers denote the inputs and the third represents the function of the node. New candidate circuits are obtained using a mutation operator that performs random changes in the chromosome. The mutations can either modify the node interconnection or functionality. The area of candidate circuits is reduced by making some nodes unreachable (such nodes, however, are removed only at the very end, and so they can still be mutated and even become reachable again). The candidates are evaluated, and the one with the best one is used in the next iteration of the design loop. The whole loop starts with the golden circuit and iteratively generates approximate solutions with better fitness values until a termination criterion (typically a given time limit) is met. Optionally, user can provide approximate circuit satisfying the threshold on the error as a seed to start with.

The *bit-parallel circuit simulation* supports all common error metrics, including the worst-case error (WCE), the mean error, the error rate representing the number of inputs leading to an incorrect output, and the Hamming distance. It utilises the power of modern processors by simulating the circuit on multiple inputs vectors (e.g. 64 inputs for 64-bit processors) in a single pass through the circuit [24]. However, despite the parallel processing that significantly accelerates the simulation, for circuits with arguments of larger bit-widths (beyond 12 bits), it is not feasible to simulate the circuits on all possible inputs, and so statistical guarantees on the approximation error are provided only.

The *BDD-based evaluation* also supports all common error metrics, and, unlike simulation, it is able to provide formal error guarantees for circuits with larger input bit-widths. For the purpose of the evaluation, the original correct circuit and its approximation are interconnected into an auxiliary circuit called a *miter* such that the error can be deduced from its output (e.g. to compute the error rate, the outputs of the golden and candidate circuits are subtracted, and the result is compared with 0). The miter is encoded as a BDD on which the circuit error is evaluated using BDD operations [22,23]. However, this technique does not scale well with the complexity of the circuits in terms of the number of their gates as the resulting BDD representation becomes prohibitively huge. Hence, this approach works well for large adders and similar circuits, but, it fails, e.g., for multipliers beyond 12-bits.

The *SAT-based evaluation* currently supports WCE only, but it provides formal guarantees and a superior performance to the BDD-based technique. ADAC implements a novel miter construction based on subtracting the output of the golden and approximate circuit, followed by a comparison with the error threshold [3]. The construction is optimised for SAT-based evaluation by avoiding long XOR chains known to cause poor performance of state-of-the-art SAT solvers [5, 9]. This allows us to exploit the ABC engine iprove, designed originally for miterbased exact circuit equivalence checking, to quickly evaluate WCE.

The final ingredient of the design process is the *search strategy*. Apart from the standard evolutionary strategies based solely on the fitness function, ADAC also implements a novel verifiability-driven approach [3] combined with the SATbased evaluation.

The *verifiability-driven search strategy* uses a limit *L* on the resources available to the underlying SAT decision procedure. The limit effectively controls the time the SAT solver can use. We require that every improving candidate has to be verifiable using the resource limit *L*. Therefore the strategy drives the search towards candidates that improve the fitness and can be promptly evaluated. As the result, we can evaluate in the given time a much larger set of candidate circuits. Our experiments indicate that this strategy often leads to a higher number of improving solutions and thus finds circuits having a smaller chip area meeting the permissible error. On the other hand, it can happen that, for a limit *L*, no improving sequence exists, while it exists for a slightly greater resource limit. We are currently implementing auto-adaptive techniques that should automatically select the adequate resource limit for the given circuit.

**Integration to the ABC Tool.** To make ADAC easily accessible, it is implemented as a new module for the ABC tool. ABC allows us to support an important subset of the Verilog specification and implementation language. We also utilize ABC to translate the circuits among different intermediate representations used for constructing miters. As mentioned before, we employ the iprove engine in our SAT-based method for evaluating the WCE. Note that iprove uses MiniSat [18] as the SAT solver. Despite the fact that ABC supports a BDD-based circuit representation and manipulation, we implemented our own BDD component (based on the BuDDy library [2]) that is tailored for evolutionary circuit approximation.

**Extensibility.** Due to its modular architecture, ADAC can be easily extended. Apart from the extensions mentioned above, we are working on a new component for error evaluation based on SAT counting methods (e.g. #SAT [4]) that could offer formal guarantees and a better scalability for the mean error and error-rate metrics, and on new candidate circuit generators counter-examples produced during the verification of candidate circuits. In a long term perspective, we plan to generalise the underlying methods and support also design of approximate sequential circuits.

#### **3 Evaluation, Related Works, and Applications**

We first compare the performance of the different methods of circuit error evaluation supported in ADAC. For that, we use results from adder approximation obtained from 10 runs, each for 5 min. The table in Fig. 2 shows average runtimes of a single error evaluation using the bit-parallel simulation, the BDD-based approach, and the SAT-based approach. The reported speedups are with respect to the simulation. We can see that the simulation provides the best performance for small bit-widths only, but it does not scale well The SAT-based method offers the best scalability and dominates for larger circuits, but it supports the WCE evaluation only. The BDD-based method, like simulation, supports all metrics and significantly outperforms the simulation for larger circuits. Note that, for more complex circuits such as multipliers, we would observe similar results with a worse relative performance of the BDD-based approach.

There indeed exist also other known methods for computing approximation errors for arithmetic circuits, including methods based on BDDs [6] or a SATbased miter solution [5]. Comparing to ADAC, these methods are less scalable, which is demonstrated by the fact that they have been used for approximating multipliers limited to 8-bit operands and adders limited to 16-bit operands only. Apart from that, there are efficient methods for *exact* equivalence checking based on algebraic computations [8,16]. However, they are so far not known for approximate equivalence checking.

**Fig. 2.** (Left) Performance of error evaluation methods for adders. (Right) A comparison of 16-bit approximate multipliers designed by ADAC vs. the best known solutions.

Next, we compare the quality of approximate circuits obtained using ADAC with circuits that appeared in the literature. We consider 16-bit multipliers since existing approaches are not able to handle larger and more complex circuits. The different points in Fig. 2 correspond to circuits with different tradeoffs between WCE in % and the power-delay product (PDP<sup>2</sup>), which is a key non-functional circuit characteristic. These circuits were obtained using various existing approaches including: (M1) configurable circuits from the lpACLib library [17], (M2) the bit-significance-driven logic compression [15], (M3) the bit-width truncation [10], (M4) compositional techniques [11], and (M5) circuits from the EvoApproxLib library [13]. We can see that just the bit-width truncation can provide a quality of results comparable with ADAC (in terms of the PDP reduction for the given WCE), but for large target errors (20% WCE or more) only. For small target errors, ADAC clearly dominates.

Note that, for each target WCE, we performed 30 independent runs of CGP to obtain statistically significant results. For each run, ADAC was executed for 2 h on an Intel Xeon X5670 2.4 GHz processor using a single core. Also note that the individual runs are independent and thus can be easily parallelised.

Further, Fig. 3 presents approximate multipliers up to 32 bits obtained by ADAC. It shows Pareto fronts representing circuits with different compromises between WCE in % and PDP, and demonstrates that ADAC goes beyond capabilities of existing methods and tools. For each target WCE, ADAC was executed for 4 hours in the case of the 24-bit instances and for 6 hours in the case of the larger instances. Note that a 32-bit exact multiplier requires over 6,300 gates, and, to the best of our

**Fig. 3.** Approximate multipliers designed by ADAC. 100% refers to PDP of the accurate circuits for the given bit-width.

knowledge, ADAC is the first tool that is able to approximate such complex circuits with formal error guarantees.

Besides the approaches mentioned above, there also exist general-purpose methods, such as SALSA [14] or SASIMI [15], approximating circuits independently of their structure. We were unable to perform a direct comparison with them due to their implementation is not available, but based on the published results, ADAC is able to provide a significantly better scalability.

**Practical Impacts.** The following list briefly characterises several resourceaware applications that build on approximate circuits. The circuits were obtained using prototype implementations of the above mentioned approaches that are now integrated in ADAC.

*Approximate multipliers for convolutional neural networks* [14]. In such networks, millions of multiplications have to be performed. The usage of applicationspecific approximate multipliers led to 90% savings in terms of power consumption of the data path for a negligible drop in classification accuracy.

<sup>2</sup> PDP characterises both the speed and energy efficiency of the circuit.

*Approximate Adders and Subtractors for a Discrete Convolutional Transformation* [22]. These adders and subtractors were designed to reduce the power consumption in video compression for the High Efficiency Video Coding (HEVC) standard. They show better quality/power trade-offs than implementations available in the literature. For example, a 25% power reduction for the same error was obtained in comparison with a recent highly-optimised implementation.

*Approximate Adders and Multipliers for Image Processing* [20]. These circuits were used in the development of efficient hardware implementations of filters and edge detectors. A 50% reduction was observed in the number of look-up tables used in a field programmable gate array for a negligible drop in the image visual quality.

#### **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

# Probabilistic Systems

## **Value Iteration for Simple Stochastic Games: Stopping Criterion and Learning Algorithm**

Edon Kelmendi, Julia Kr¨amer, Jan Kˇret´ınsk´y(B), and Maximilian Weininger

Technical University of Munich, Munich, Germany jan.kretinsky@tum.de

**Abstract.** Simple stochastic games can be solved by value iteration (VI), which yields a sequence of under-approximations of the value of the game. This sequence is guaranteed to converge to the value only in the limit. Since no stopping criterion is known, this technique does not provide any guarantees on its results. We provide the first stopping criterion for VI on simple stochastic games. It is achieved by additionally computing a convergent sequence of *over-approximations* of the value, relying on an analysis of the game graph. Consequently, VI becomes an anytime algorithm returning the approximation of the value and the current error bound. As another consequence, we can provide a simulationbased asynchronous VI algorithm, which yields the same guarantees, but without necessarily exploring the whole game graph.

#### **1 Introduction**

**Simple Stochastic Game.** (SG) [Con92] is a zero-sum two-player game played on a graph by Maximizer and Minimizer, who choose actions in their respective vertices (also called states). Each action is associated with a probability distribution determining the next state to move to. The objective of Maximizer is to maximize the probability of reaching a given target state; the objective of Minimizer is the opposite.

Stochastic games constitute a fundamental problem for several reasons. From the theoretical point of view, the complexity of this problem<sup>1</sup> is known to be in **UP** ∩ **coUP** [HK66], but no polynomial-time algorithm is known. Further,

This research was funded in part by the German Excellence Initiative and the European Union Seventh Framework Programme under grant agreement No. 291763 for TUM – IAS, the Studienstiftung des deutschen Volkes project "Formal methods for analysis of attack-defence diagrams", the Czech Science Foundation grant No. 18- 11193S, TUM IGSSE Grant 10.06 (PARSEC), and the German Research Foundation (DFG) project KR 4890/2-1 "Statistical Unbounded Verification".

<sup>1</sup> Formally, the problem is to decide, for a given <sup>p</sup> <sup>∈</sup> [0, 1] whether Maximizer has a strategy ensuring probability at least p to reach the target.

c The Author(s) 2018

H. Chockler and G. Weissenbacher (Eds.): CAV 2018, LNCS 10981, pp. 623–642, 2018. https://doi.org/10.1007/978-3-319-96145-3\_36

several other important problems can be reduced to SG, for instance parity games, mean-payoff games, discounted-payoff games and their stochastic extensions [CF11]. The task of solving SG is also polynomial-time equivalent to solving perfect information Shapley, Everett and Gillette games [AM09]. Besides, the problem is practically relevant in verification and synthesis. SG can model reactive systems, with players corresponding to the controller of the system and to its environment, where quantified uncertainty is explicitly modelled. This is useful in many application domains, ranging from smart energy management [CFK+13a] to autonomous urban driving [CKSW13], robot motion planning [LaV00] to self-adaptive systems [CMG14]; for various recent case studies, see e.g. [SK16]. Finally, since Markov decision processes (MDP) [Put14] are a special case with only one player, SG can serve as abstractions of large MDP [KKNP10].

**Solution Techniques.** There are several classes of algorithms for solving SG, most importantly strategy iteration (SI) algorithms [HK66] and value iteration (VI) algorithms [Con92]. Since the repetitive evaluation of strategies in SI is often slow in practice, VI is usually preferred, similarly to the special case of MDPs [KM17]. For instance, the most used probabilistic model checker PRISM [KNP11] and its branch PRISM-Games [CFK+13a] use VI for MDP and SG as the default option, respectively. However, while SI is in principle a precise method, VI is an approximative method, which converges only in the limit. Unfortunately, there is no known stopping criterion for VI applied to SG. Consequently, there are no guarantees on the results returned in finite time. Therefore, current tools stop when the difference between the two most recent approximations is low, and thus may return arbitrarily imprecise results [HM17].

**Value Iteration with Guarantees.** In the special case of MDP, in order to obtain bounds on the imprecision of the result, one can employ a *bounded* variant of VI [MLG05,BCC+14] (also called *interval iteration* [HM17]). Here one computes not only an under-approximation, but also an over-approximation of the actual value as follows. On the one hand, iterative computation of the least fixpoint of Bellman equations yields an under-approximating sequence converging to the value. On the other hand, iterative computation of the greatest fixpoint yields an over-approximation, which, however, does not converge to the value. Moreover, it often results in the trivial bound of 1. A solution suggested for MDPs [BCC+14,HM17] is to modify the underlying graph, namely to collapse end components. In the resulting MDP there is only one fixpoint, thus the least and greatest fixpoint coincide and both approximating sequences converge to the actual value. In contrast, for general SG no procedure where the greatest fixpoint converges to the value is known. In this paper we provide one, yielding a stopping criterion. We show that the pre-processing approach of collapsing is not applicable in general and provide a solution on the original graph. We also characterize SG where the fixpoints coincide and no processing is needed. The main technical challenge is that states in an end component in SG can have different values, in contrast to the case of MDP.

**Practical Efficiency Using Guarantees.** We further utilize the obtained guarantees to practically improve our algorithm. Similar to the MDP case [BCC+14], the quantification of the error allows for ignoring parts of the state space, and thus a speed up without jeopardizing the correctness of the result. Indeed, we provide a technique where some states are not explored and processed at all, but their potential effect is still taken into account The information is further used to decide the states to be explored next and to be analyzed in more detail. To this end, simulations and learning are used as tools. While for MDP this idea has already demonstrated speed ups in orders of magnitude [BCC+14,ACD+17], this paper provides the first technique of this kind for SG. **Our contribution** is summarized as follows


**Related Work.** The works closest to ours are the following. As mentioned above, [BCC+14,HM17] describe the solution to the special case of MDP. While [BCC+14] also provides a learning-based algorithm, [HM17] discusses the convergence rate and the exact solution. The basic algorithm of [HM17] is implemented in PRISM [BKL+17] and the learning approach of [BCC+14] in Storm [DJKV17a]. The extension for SG where the interleaving of players is severely limited (every end component belongs to one player only) is discussed in [Ujm15].

Further, in the area of probabilistic planning, bounded real-time dynamic programming [MLG05] is related to our learning-based approach. However, it is limited to the setting of stopping MDP where the target sink or the nontarget sink is reached almost surely under any pair of strategies and thus the fixpoints coincide. Our algorithm works for general SG, not only for stopping ones, without any blowup.

For SG, the tools implementing the standard SI and/or VI algorithms are PRISM-games [CFK+13a], GAVS+ [CKLB11] and GIST [CHJR10]. The latter two are, however, neither maintained nor accessible via the links provided in their publications any more.

Apart from fundamental algorithms to solve SG, there are various practically efficient heuristics that, however, provide none or weak guarantees, often based on some form of learning [BT00,LL08,WT16,TT16,AY17,BBS08]. Finally, the only currently available way to obtain any guarantees through VI is to perform γ<sup>2</sup> iterations and then round to the nearest multiple of 1/γ, yielding the value of the game with precision 1/γ [CH08]; here γ cannot be freely chosen, but it is a fixed number, exponential in the number of states and the used probability denominators. However, since the precision cannot be chosen and the number of iterations is always exponential, this approach is infeasible even for small games.

**Organization of the Paper.** Section 2 introduces the basic notions and revises value iteration. Section 3 explains the idea of our approach on an example. Section 4 provides a full technical treatment of the method as well as the learningbased variation. Section 5 discusses experimental results and Sect. 6 concludes. The appendix (available in [KKKW18]) gives technical details on the pseudocode as well as the conducted experiments and provides more extensive proofs to the theorems and lemmata; in this paper, there are only proof sketches and ideas.

### **2 Preliminaries**

#### **2.1 Basic Definitions**

A probability distribution on a finite set X is a mapping δ : X → [0, 1], such that - <sup>x</sup>∈<sup>X</sup> <sup>δ</sup>(x) = 1. The set of all probability distributions on <sup>X</sup> is denoted by D(X). Now we define stochastic games, in literature often referred as simple stochastic games or stochastic two-player games with a reachability objective.

**Definition 1 (SG).** *A* stochastic game (SG) *is a tuple* (*S* , *S*-, *<sup>S</sup>*-,s0,A, Av, δ, 1, 0)*, where S is a finite set of* states *partitioned into the sets S and S of states of the player* Maximizer *and* Minimizer*, respectively,* s0, 1, 0 ∈ *S is the* initial *state,* target *state, and* sink *state, respectively,* A *is a finite set of* actions*,* Av : *<sup>S</sup>* <sup>→</sup> <sup>2</sup><sup>A</sup> *assigns to every state a set of* available *actions, and* δ : *S* × A → D(*S* ) *is a* transition function *that given a state* s *and an action* a ∈ Av(s) *yields a probability distribution over* successor *states.*

*<sup>A</sup>* Markov decision process (MDP) *is a special case of SG where S*-= ∅*.*

We assume that SGs are non-blocking, so for all states s we have Av(s) = ∅. Further, 1 and 0 only have one action and it is a self-loop with probability 1. Additionally, we can assume that the SG is preprocessed so that all states with no path to 1 are merged with 0.

For a state s and an available action a ∈ Av(s), we denote the set of successors by Post(s, a) := {s | δ(s, a,s ) > 0}. Finally, for any set of states T ⊆ *S* , we use T and T to denote the states in T that belong to Maximizer and Minimizer, whose states are drawn in the figures as and , respectively.

The semantics of SG is given in the usual way by means of strategies and the induced Markov chain and the respective probability space, as follows. An *infinite path* <sup>ρ</sup> is an infinite sequence <sup>ρ</sup> <sup>=</sup> <sup>s</sup>0a0s1a<sup>1</sup> ···∈ (*<sup>S</sup>* <sup>×</sup>A)<sup>ω</sup>, such that for every <sup>i</sup> <sup>∈</sup> <sup>N</sup>, <sup>a</sup><sup>i</sup> <sup>∈</sup> Av(si) and <sup>s</sup>i+1 <sup>∈</sup> Post(si, <sup>a</sup>i). *Finite path*s are defined analogously as elements of (*S* × A)<sup>∗</sup> × *S* . Since this paper deals with the reachability objective, we can restrict our attention to memoryless strategies, which are optimal for this objective. We still allow randomizing strategies, because they are needed for the learning-based algorithm later on. A *strategy* of Maximizer or Minimizer is a function σ : *S*- → D(A) or *<sup>S</sup>*-→ D(A), respectively, such that σ(s) ∈ D(Av(s)) for all s. We call a strategy *deterministic* if it maps to Dirac distributions only. Note that there are finitely many deterministic strategies. A pair (σ, τ ) of strategies of Maximizer and Minimizer induces a Markov chain Gσ,τ where the transition probabilities are defined as δ(s,s ) = - <sup>a</sup>∈Av(s) <sup>σ</sup>(s, <sup>a</sup>)· <sup>δ</sup>(s, <sup>a</sup>,s ) for states of Maximizer and analogously for states of Minimizer, with σ replaced by τ . The Markov chain induces a unique probability distribution Pσ,τ <sup>s</sup> over measurable sets of infinite paths [BK08, Chap. 10].

We write ♦<sup>1</sup> := {<sup>ρ</sup> | ∃<sup>i</sup> <sup>∈</sup> <sup>N</sup>. ρ(i) = <sup>1</sup>} to denote the (measurable) set of all paths which eventually reach 1. For each s ∈ *S* , we define the *value* in s as

$$\mathcal{V}(\mathbf{s}) := \sup\_{\sigma} \inf\_{\tau} \mathbb{P}\_s^{\sigma,\tau}(\diamondsuit \mathbf{u}) = \inf\_{\tau} \sup\_{\sigma} \mathbb{P}\_s^{\sigma,\tau}(\diamondsuit \mathbf{u}),$$

where the equality follows from [Mar75]. We are interested not only in V(s0), but also its ε-approximations and the corresponding (ε-)optimal strategies for both players.

Now we recall a fundamental tool for analysis of MDP called end components. We introduce the following notation. Given a set of states T ⊆ *S* , a state s ∈ T and an action a ∈ Av(s), we say that (s, a) exits T if Post(s, a) ⊆ T. We define an end component of a SG as the end component of the underlying MDP with both players unified.

**Definition 2 (EC).** *A non-empty set* T ⊆ *S of states is an* end component (EC) *if there is a non-empty set* B ⊆ <sup>s</sup>∈<sup>T</sup> Av(s) *of actions such that*


Intuitively, ECs correspond to bottom strongly connected components of the Markov chains induced by possible strategies, so for some pair of strategies all possible paths starting in the EC remain there. An end component T is a *maximal end component (MEC)* if there is no other end component T such that T ⊆ T . Given an SG G, the set of its MECs is denoted by MEC(G) and can be computed in polynomial time [CY95].

#### **2.2 (Bounded) Value Iteration**

The value function V satisfies the following system of equations, which is referred to as the *Bellman equations*:

$$\mathsf{V}(\mathsf{s}) = \begin{cases} \max\_{\mathsf{s} \in \mathsf{Av}(\mathsf{s})} \mathsf{V}(\mathsf{s}, \mathsf{a}) & \text{if } \mathsf{s} \in S\_{\square} \\ \min\_{\mathsf{a} \in \mathsf{Av}(\mathsf{s})} \mathsf{V}(\mathsf{s}, \mathsf{a}) & \text{if } \mathsf{s} \in S\_{\odot} \\ 1 & \text{if } \mathsf{s} = \mathsf{u} \\ 0 & \text{if } \mathsf{s} = \mathsf{o} \end{cases} \tag{1}$$

where<sup>2</sup>

$$\mathsf{V}(\mathsf{s}, \mathsf{a}) := \sum\_{s' \in S} \delta(\mathsf{s}, \mathsf{a}, \mathsf{s}') \cdot \mathsf{V}(\mathsf{s}') \tag{2}$$

Moreover, V is the *least* solution to the Bellman equations, see e.g. [CH08]. To compute the value of V for all states in an SG, one can thus utilize the iterative approximation method *value iteration (VI)* as follows. We start with a lower bound function L<sup>0</sup> : *S* → [0, 1] such that L0(1) = 1 and, for all other s ∈ *S* , L0(s) = 0. Then we repetitively apply Bellman updates (3) and (4)

$$\mathsf{L}\_{n}(\mathsf{s},\mathsf{a}) := \sum\_{s' \in S} \delta(\mathsf{s},\mathsf{a},\mathsf{s}') \cdot \mathsf{L}\_{n-1}(\mathsf{s}') \tag{3}$$

$$\mathsf{L}\_{n}(\mathsf{s}) := \begin{cases} \max\_{\mathsf{a} \in \mathsf{Av}(\mathsf{s})} \mathsf{L}\_{n}(\mathsf{s}, \mathsf{a}) & \text{if } \mathsf{s} \in S\_{\Box} \\ \min\_{\mathsf{a} \in \mathsf{Av}(\mathsf{s})} \mathsf{L}\_{n}(\mathsf{s}, \mathsf{a}) & \text{if } \mathsf{s} \in S\_{\Box} \end{cases} \tag{4}$$

until convergence. Note that convergence may happen only in the limit even for such a simple game as in Fig. 1 on the left. The sequence is monotonic, at all times a *lower* bound on V , i.e. Li(s) ≤ V(s) for all s ∈ *S* , and the least fixpoint satisfies L<sup>∗</sup> := lim<sup>n</sup>→∞ L<sup>n</sup> = V.

Unfortunately, there is no known stopping criterion, i.e. no guarantees how close the current under-approximation is to the value [HM17]. The current tools stop when the difference between two successive approximations is smaller than a certain threshold, which can lead to arbitrarily wrong results [HM17].

For the special case of MDP, it has been suggested to also compute the greatest fixpoint [MLG05] and thus an *upper* bound as follows. The function G : *S* → [0, 1] is initialized for all states s ∈ *S* as G0(s) = 1 except for G0(0) = 0. Then we repetitively apply updates (3) and (4), where L is replaced by G. The resulting sequence G<sup>n</sup> is monotonic, provides an upper bound on V and the greatest fixpoint G<sup>∗</sup> := lim<sup>n</sup> G<sup>n</sup> is the greatest solution to the Bellman equations on [0, 1]<sup>S</sup>.

This approach is called *bounded value iteration (BVI)* (or *bounded realtime dynamic programming (BRTDP)* [MLG05,BCC+14] or *interval iteration* [HM17]). If L<sup>∗</sup> = G<sup>∗</sup> then they are both equal to V and we say that *BVI converges*. BVI is guaranteed to converge in MDP if the only ECs are those of 1 and 0 [BCC+14]. Otherwise, if there are non-trivial ECs they have to be "collapsed"<sup>3</sup>. Computing the greatest fixpoint on the modified MDP results in another sequence U<sup>i</sup> of upper bounds on V, converging to U<sup>∗</sup> := lim<sup>n</sup> Un. Then BVI converges even for general MDPs, U<sup>∗</sup> = V [BCC+14], when transformed this way. The next section illustrates this difficulty and the solution through collapsing on an example.

<sup>2</sup> Throughout the paper, for any function <sup>f</sup> : <sup>S</sup> <sup>→</sup> [0, 1] we overload the notation and also write f(s, a) meaning - s-<sup>∈</sup><sup>S</sup> <sup>δ</sup>(s, <sup>a</sup>,<sup>s</sup> ) · f(s ).

<sup>3</sup> All states of an EC are merged into one, all leaving actions are preserved and all other actions are discarded. For more detail see [KKKW18, Appendix A.1].

In summary, all versions of BVI discussed so far and later on in the paper follow the pattern of Algorithm 1. In the naive version, UPDATE just performs the Bellman update on L and U according to Eqs. (3) and (4).<sup>4</sup> For a general MDP, U does not converge to V, but to G∗, and thus the termination criterion may never be met if G∗(s0)−V(s0) > 0. If the ECs are collapsed in pre-processing then U converges to V.

For the general case of SG, the collapsing approach fails and this paper provides another version of BVI where U converges to V, based on a more detailed structural analysis of the game.


#### **3 Example**

In this section, we illustrate the issues preventing BVI convergence and our solution on a few examples. Recall that G is the sequence converging to the greatest solution of the Bellman equations, while U is in general any sequence over-approximating V that one or another BVI algorithm suggests.

Firstly, we illustrate the issue that arises already for the special case of MDP. Consider the MPD of Fig. 1 on the left. Although V(s) = V(t)=0.5, we have Gi(s) = Gi(t) = 1 for all i. Indeed, the upper bound for t is always updated as the maximum of Gi(t, c) and Gi(t, b). Although Gi(t, c) decreases over time, Gi(t, b) remains the same, namely equal to Gi(s), which in turn remains equal to Gi(s, a) = Gi(t). This cyclic dependency lets both s and t remain in an "illusion" that the value of the other one is 1.

The solution for MDP is to remove this cyclic dependency by collapsing all MECs into singletons and removing the resulting purely self-looping actions. Figure 1 in the middle shows the MDP after collapsing the EC {s,t}. This turns the MDP into a stopping one, where 1 or 0 is under any strategy reached with probability 1. In such MDP, there is a unique solution to the Bellman equations. Therefore, the greatest fixpoint is equal to the least one and thus to V.

<sup>4</sup> For the straightforward pseudocode, see [KKKW18, Appendix A.2].

Secondly, we illustrate the issues that additionally arise for general SG. It turns out that the collapsing approach can be extended only to games where all states of each EC belong to one player only [Ujm15]. In this case, both Maximizer's and Minimizer's ECs are collapsed the same way as in MDP.

However, when both players are present in an EC, then collapsing may not solve the issue. Consider the SG of Fig. 2. Here α and β represent the values of the respective actions.<sup>5</sup> There are three cases:

First, let α<β. If the bounds converge to these values we eventually observe Gi(q, e) < Li(r, f) and learn the induced inequality. Since p is a Minimizer's state it will never pick the action leading to the greater value of β. Therefore, we can safely merge p and q, and remove the action leading to r, as shown in the second subfigure.

Second, if α>β, p and r can be merged in an analogous way, as shown in the third subfigure.

Third, if α = β, both previous solutions as well as collapsing all three states as in the fourth subfigure is possible. However, since the approximants may only converge to α and β in the limit, we may not know in finite time which of these cases applies and thus cannot decide for any of the collapses.

Consequently, the approach of collapsing is not applicable in general. In order to ensure BVI convergence, we suggest a different method, which we call *deflating*. It does not involve changing the state space, but rather decreasing the upper bound U<sup>i</sup> to the least value that is currently provable (and thus still correct). To this end, we analyze the exiting actions, i.e. with successors outside of the EC, for the following reason. If the play stays in the EC forever, the target is never reached and Minimizer wins. Therefore, Maximizer needs to pick some exiting action to avoid staying in the EC.

**Fig. 1.** Left: An MDP (as special case of SG) where BVI does not converge due to the grayed EC. Middle: The same MDP where the EC is collapsed, making BVI converge. Right: The approximations illustrating the convergence of the MDP in the middle.

<sup>5</sup> Precisely, we consider them to stand for a probabilistic branching with probability α (or β) to 1 and with the remaining probability to 0. To avoid clutter in the figure, we omit this branching and depict only the value.

**Fig. 2.** Left: Collapsing ECs in SG may lead to incorrect results. The Greek letters on the leaving arrows denote the values of the exiting actions. Right three figures: Correct collapsing in different cases, depending on the relationship of α and β. In contrast to MDP, some actions of the EC exiting the collapsed part have to be removed.

For the EC with the states s and t in Fig. 1, the only exiting action is c. In this example, since c is the only exiting action, Ui(t, c) is the highest possible upper bound that the EC can achieve. Thus, by decreasing the upper bound of all states in the EC to that number<sup>6</sup>, we still have a safe upper bound. Moreover, with this modification BVI converges in this example, intuitively because now the upper bound of t depends on action c as it should.

For the example in Fig. 2, it is correct to decrease the upper bound to the maximal exiting one, i.e. max{α, <sup>ˆ</sup> <sup>β</sup>ˆ}, where ˆ<sup>α</sup> := <sup>U</sup>i(a), <sup>β</sup><sup>ˆ</sup> := <sup>U</sup>i(b) are the current approximations of α and of β. However, this itself does not ensure BVI convergence. Indeed, if for instance ˆα < βˆ then deflating all states to βˆ is not tight enough, as values of p and q can even be bounded by ˆα. In fact, we have to find a certain sub-EC that corresponds to ˆα, in this case {p, q} and set all its upper bounds to ˆα. We define and compute these sub-ECs in the next section.

In summary, the general structure of our convergent BVI algorithm is to produce the sequence U by application of Bellman updates and occasionally find the relevant sub-ECs and deflate them. The main technical challenge is that states in an EC in SG can have different values, in contrast to the case of MDP.

#### **4 Convergent Over-Approximation**

In Sect. 4.1, we characterize SGs where Bellman equations have more solutions. Based on the analysis, subsequent sections show how to alter the procedure computing the sequence G<sup>i</sup> over-approximating V so that the resulting tighter sequence U<sup>i</sup> still over-approximates V, but also converges to V. This ensures that thus modified BVI converges. Section 4.4 presents the learning-based variant of our BVI.

<sup>6</sup> We choose the name "deflating" to evoke decreasing the overly high "pressure" in the EC until it equalizes with the actual "pressure" outside.

#### **4.1 Bloated End Components Cause Non-convergence**

As we have seen in the example of Fig. 2, BVI generally does not converge due to ECs with a particular structure of the exiting actions. The analysis of ECs relies on the extremal values that can be achieved by exiting actions (in the example, α and β). Given the value function V or just its current over-approximation Ui, we define the most profitable exiting action for Maximizer (denoted by -) and Minimizer (denoted by ) as follows.

**Definition 3 (**bestExit**).** *Given a set of states* T ⊆ *S and a function* f : *S* → [0, 1] *(see footnote 2), the* f*-value of the best* T*-exiting action of Maximizer and Minimizer, respectively, is defined as*

$$\mathsf{best}\mathsf{Ext}\_{f}^{\square}(T) = \max\_{\substack{\mathfrak{s}\in T\_{\square} \\ (\mathfrak{s},\mathfrak{a})\text{ exists}}} f(\mathfrak{s},\mathfrak{a})$$

$$\mathsf{best}\mathsf{Ext}\_{f}^{\square}(T) = \min\_{\substack{\mathfrak{s}\in T\_{\heartsuit} \\ (\mathfrak{s},\mathfrak{a})\text{ exists}}} f(\mathfrak{s},\mathfrak{a})$$

*with the convention that* max<sup>∅</sup> = 0 *and* min<sup>∅</sup> = 1*.*

*Example 1.* In the example of Fig. 2 on the left with T = {p, q,r} and α<β, we have bestExit- <sup>V</sup> (T) = β, bestExit- <sup>V</sup> (T) = 1. It is due to β < 1 that BVI does not converge here. We generalize this in the following lemma.

**Lemma 1.** *Let* T *be an EC. For every* m *satisfying* bestExit- <sup>V</sup> (T) ≤ m ≤ bestExit- <sup>V</sup> (T)*, there is a solution* f : *S* → [0, 1] *to the Bellman equations, which on* T *is constant and equal to* m*.*

*Proof (Idea).* Intuitively, such a constant m is a solution to the Bellman equations on T for the following reasons. As both players prefer getting m to exiting and getting "only" the values of their respective bestExit, they both choose to stay in the EC (and the extrema in the Bellman equations are realized on nonexiting actions). On the one hand, Maximizer (Bellman equations with max) is hoping for the promised m, which is however not backed up by any actions actually exiting towards the target. On the other hand, Minimizer (Bellman equations with min) does not realize that staying forever results in her optimal value 0 instead of m. 

#### **Corollary 1.** *If* bestExit- <sup>V</sup> (T) > bestExit- <sup>V</sup> (T) *for some EC* T*, then* G<sup>∗</sup> = V*.*

*Proof.* Since there are m1, m<sup>2</sup> such that bestExit- <sup>V</sup> (T) < m<sup>1</sup> < m<sup>2</sup> < bestExit- <sup>V</sup> (T), by Lemma 1 there are two different solutions to the Bellman equations. In particular, G<sup>∗</sup> > L<sup>∗</sup> = V, and BVI does not converge. 

In accordance with our intuition that ECs satisfying the above inequality should be deflated, we call them bloated.

**Definition 4 (BEC).** *An EC* T *is called a* bloated end component (BEC)*, if* bestExit- <sup>V</sup> (T) > bestExit- <sup>V</sup> (T)*.*

*Example 2.* In the example of Fig. 2 on the left with α<β, the ECs {p, q} and {p, q,r} are BECs.

*Example 3.* If an EC T has no exiting actions of Minimizer (or no Minimizer's states at all, as in an MDP), then bestExit- <sup>V</sup> (T) = 1 (the case with min∅). Hence all numbers between bestExit- <sup>V</sup> (T) and 1 are a solution to the Bellman equations and G∗(s) = 1 for all states s ∈ T.

Analogously, if Maximizer does not have any exiting action in T, then it holds that bestExit- <sup>V</sup> (T) = 0 (the case with max∅), T is a BEC and all numbers between 0 and bestExit- <sup>V</sup> (T) are a solution to the Bellman equations.

Note that in MDP all ECs belong to one player, namely Maximizer. Consequently, all ECs are BECs except for ECs where Maximizer has an exiting action with value 1; all other ECs thus have to be collapsed (or deflated) to ensure BVI convergence in MDPs. Interestingly, all non-trivial ECs in MDPs are a problem, while in SGs through the presence of the other player some ECs can converge, namely if both players want to exit (See e.g. [KKKW18, Appendix A.3]).

We show that BECs are indeed the only obstacle for BVI convergence.

### **Theorem 1.** *If the SG contains no BECs except for* {0} *and* {1}*, then* G<sup>∗</sup> = V*.*

*Proof (Sketch).* Assume, towards a contradiction, that there is some state s with a positive difference G∗(s) − V(s) > 0. Consider the set D of states with the maximal difference. D can be shown to be an EC. Since it is not a BEC there has to be an action exiting D and realizing the optimum in that state. Consequently, this action also has the maximal difference, and all its successors, too. Since some of the successors are outside of D, we get a contradiction with the maximality of D. 

In Sect. 4.2, we show how to eliminate BECs by collapsing their "core" parts, called below MSECs (maximal simple end components). Since MSECs can only be identified with enough information about V, Sect. 4.3 shows how to avoid direct *a priori* collapsing and instead dynamically deflate candidates for MSECs in a conservative way.

#### **4.2 Static MSEC Decomposition**

Now we turn our attention to SG with BECs. Intuitively, since in a BEC all Minimizer's exiting actions have a higher value than what Maximizer can achieve, Minimizer does not want to use any of his own exiting actions and prefers staying in the EC (or steering Maximizer towards his worse exiting actions). Consequently, only Maximizer wants to take an exiting action. In the MDP case he can pick any desirable one. Indeed, he can wait until he reaches a state where it is available. As a result, in MDP all states of an EC have the *same value* and can all be collapsed into one state. In the SG case, he may be restricted by Minimizer's behaviour or even not given any chance to exit the EC at all. As a result, a BEC may contain several parts (below denoted MSECs), each with different value, intuitively corresponding to different exits. Thus instead of MECs, we have to decompose into finer MSECs and only collapse these.

**Definition 5 (**Simple EC**).** *An EC* T *is called* simple (SEC)*, if for all* s ∈ T *we have* V(s) = bestExit- <sup>V</sup> (T)*.*

*A SEC* C *is* maximal (MSEC) *if there is no SEC* C *such that* C - C *.*

Intuitively, an EC is simple, if Minimizer cannot keep Maximizer away from his bestExit. Independently of Minimizer's decisions, Maximizer can reach the bestExit almost surely, unless Minimizer decides to leave, in which case Maximizer could achieve an even higher value.

*Example 4.* Assume α<β in the example of Fig. 2. Then {p, q} is a SEC and an MSEC. Further observe that action c is sub-optimal for Minimizer and removing it does not affect the value of any state, but simplifies the graph structure. Namely, it destructs the whole EC into several (here only one) SECs and some non-EC states (here r).

Algorithm 2, called FIND MSEC, shows how to compute MSECs. It returns the set of all MSECs if called with parameter V. However, later we also call this function with other parameters f : *S* → [0, 1]. The idea of the algorithm is the following. The set X consists of Minimizer's sub-optimal actions, leading to a higher value. As such they cannot be a part of any SEC and thus should be ignored when identifying SECs. (The previous example illustrates that ignoring X is indeed safe as it does not change the value of the game.) We denote the game G where the available actions Av are changed to the new available actions Av (ignoring the Minimizer's sub-optimal ones) as G[Av/Av-] . Once removed, Minimizer has no choices to affect the value and thus each EC is simple.


**Lemma 2 (Correctness of Algorithm** 2**).** T ∈ FIND MSEC(V) *if and only if* T *is a MSEC.*

*Proof (Sketch).* "If": As T is an MSEC, all states in T have the value bestExit- <sup>V</sup> (T), and hence also all actions that stay inside T have this value. Thus, no action that stays in T is removed by Line 3 and it is still a MEC in the modified game.

"Only if": If T ∈ FIND MSEC(V), then T is a MEC of the game where the suboptimal available actions (those in X) of Minimizer have been removed. Hence for all <sup>s</sup> <sup>∈</sup> <sup>T</sup> : <sup>V</sup>(s) = bestExit- <sup>V</sup> (T), because intuitively Minimizer has no possibility to influence the value any further, since all actions that could do so were in X and have been removed. Since T is a MEC in the modified game, it certainly is an EC in the original game. Hence T is a SEC. The inclusion maximality follows from the fact that we compute MECs in the modified game. Thus T is an MSEC. 

*Remark 1 (Algorithm with an oracle).* In Sect. 3, we have seen that collapsing MECs does not ensure BVI convergence. Collapsing does not preserve the values, since in BECs we would be collapsing states with different values. Hence we want to collapse only MSECs, where the values are the same. If, moreover, we remove X in such a collapsed SG, then there are no (non-sink) ECs and BVI converges on this SG to the original value.

The difficulty with this algorithm is that it requires an oracle to compare values, for instance a sufficiently precise approximation of V. Consequently, we cannot pre-compute the MSECs, but have to find them while running BVI. Moreover, since the approximations converge only in the limit we may never be able to conclude on simplicity of some ECs. For instance, if α = β in Fig. 2, and if the approximations converge at different speeds, then Algorithm 2 always outputs only a part of the EC, although the whole EC on {p, q,r} is simple.

In MDPs, all ECs are simple, because there is no second player to be resolved and all states in an EC have the same value. Thus for MDPs it suffices to collapse all MECs, in contrast to SG.

#### **4.3 Dynamic MSEC Decomposition**

Since MSECs cannot be identified from approximants of V for sure, we refrain from collapsing<sup>7</sup> and instead only decrease the over-approximation in the corresponding way. We call the method *deflating*, by which we mean decreasing the upper bound of all states in an EC to its bestExit- <sup>U</sup> , see Algorithm 3. The procedure DEFLATE (called on the current upper bound Ui) decreases this upper bound to the minimum possible value according to the current approximation and thus prevents states from only depending on each other, as in SECs. Intuitively, it gradually approximates SECs and performs the corresponding adjustments, but does not commit to any of the approximations.


<sup>7</sup> Our subsequent method can be combined with local collapsing whenever the lower and upper bounds on V are conclusive.

**Lemma 3 (**DEFLATE **is sound).** *For any* f : *S* → [0, 1] *such that* f ≥ V *and any EC* T*,* DEFLATE(T,f) ≥ V*.*

This allows us to define our BVI algorithm as the naive BVI with only the additional lines 3–4, see Algorithm 4.


**Theorem 2 (Soundness and completeness).** *Algorithm 1 (calling Algorithm 4) produces monotonic sequences* L *under- and* U *over-approximating* V*, and terminates.*

*Proof (Sketch).* The crux is to show that U converges to V. We assume towards a contradiction, that there exists a state s with lim<sup>n</sup>→∞ Un(s) − V(s) > 0. Then there exists a nonempty set of states X where the difference between lim<sup>n</sup>→∞ U<sup>n</sup> and V is maximal. If the upper bound of states in X depends on states outside of X, this yields a contradiction, because then the difference between upper bound and value would decrease in the next Bellman update. So X must be an EC where all states depend on each other. However, if that is the case, calling DEFLATE decreases the upper bound to something depending on the states outside of X, thus also yielding a contradiction. 

#### **Summary of Our Approach:**


#### **4.4 Learning-Based Algorithm**

*Asynchronous value iteration* selects in each round a subset T ⊆ S of states and performs the Bellman update in that round only on T. Consequently, it may speed up computation if "important" states are selected. However, using the standard VI it is even more difficult to determine the current error bound. Moreover, if some states are not selected infinitely often the lower bound may not even converge.

In the setting of bounded value iteration, the current error bound is known for each state and thus convergence can easily be enforced. This gave rise to asynchronous VI, such as BRTDP (bounded real time dynamic programing) in the setting of stopping MDPs [MLG05], where the states are selected as those that appear on a simulation run. Very similar is the adaptation for general MDP [BCC+14]. In order to simulate a run, the transition probabilities determine how to resolve the probabilistic choice. In order to resolve the non-deterministic choice of Maximizer, the "most promising action" is taken, i.e., with the highest U. This choice is derived from a reinforcement algorithm called delayed Q-learning and ensures convergence while practically performing well [BCC+14].

In this section, we harvest our convergence results and BVI algorithm for SG, which allow us to trivially extend the asynchronous learning-based approach of BRTDP to SGs. On the one hand, the only difference to the MDP algorithm is how to resolve the choice for Minimizer. Since the situation is dual, we again pick the "most promising action", in this case with the lowest L. On the other hand, the only difference to Algorithm 1 calling Algorithm 4 is that the Bellman updates of U and L are performed on the states of the simulation run only, see lines 2–3 of Algorithm 5.

**Algorithm 5.** Update procedure for the learning/BRTDP version of BVI on SG


If 1 or 0 is reached in a simulation, we can terminate it. It can happen that the simulation cycles in an EC. To that end, we have a bound k on the maximum number of steps. The choice of k is discussed in detail in [BCC+14] and we use 2 · |*S* | to guarantee the possibility of reaching sinks as well as exploring new states. If the simulation cycles in an EC, the subsequent call of DEFLATE ensures that next time there is a positive probability to exit this EC. Further details can be found in [KKKW18, Appendix A.4].

#### **5 Experimental Results**

We implemented both our algorithms as an extension of PRISMgames [CFK+13a], a branch of PRISM [KNP11] that allows for modelling SGs, utilizing previous work of [BCC+14,Ujm15] for MDP and SG with singleplayer ECs. We tested the implementation on the SGs from the PRISM-games case studies [gam] that have reachability properties and one additional model from [CKJ12] that was also used in [Ujm15]. We compared the results with both the explicit and the hybrid engine of PRISM-games, but since the models are small both of them performed similar and we only display the results of the hybrid engine in Table 1.

Furthermore we ran experiments on MDPs from the PRISM benchmark suite [KNP12]. We compared our results there to the hybrid and explicit engine of PRISM, the interval iteration implemented in PRISM [HM17], the hybrid engine of Storm [DJKV17a] and the BRTDP implementation of [BCC+14].

Recall that the aim of the paper is not to provide a faster VI algorithm, but rather the first guaranteed one. Consequently, the aim of the experiments is not to show any speed ups, but to experimentally estimate the overhead needed for computing the guarantees.

For information on the technical details of the experiments, all the models and the tables for the experiments on MDPs we refer to [KKKW18, Appendix B]. Note that although some of the SG models are parametrized they could only be scaled by manually changing the model file, which complicates extensive benchmarking.

Although our approaches compute the additional upper bound to give the convergence guarantees, for each of the experiments one of our algorithms performed similar to PRISM-games. Table 1 shows this result for three of the four SG models in the benchmarking set. On the fourth model, PRISM's precomputations already solve the problem and hence it cannot be used to compare the approaches. For completeness, the results are displayed in [KKKW18, Appendix B.5].

**Table 1.** Experimental results for the experiments on SGs. The left two columns denote the model and the given parameters, if present. Columns 3 to 5 display the verification time in seconds for each of the solvers, namely PRISM-games (referred as PRISM), our BVI algorithm (BVI) and our learning-based algorithm (BRTDP). The next two columns compare the number of states that BRTDP explored (#States B) to the total number of states in the model. The rightmost column shows the number of MSECs in the model.


Whenever there are few MSECs, as in mdsm and cdmsn, BVI performs like PRISM-games, because only little time is used for deflating. Apparently the additional upper bound computation takes very little time in comparison to the other tasks (e.g. parsing, generating the model, pre-computation) and does not slow down the verification significantly. For cloud, BVI is slower than PRISMgames, because there are thousands of MSECs and deflating them takes over 80% of the time. This comes from the fact that we need to compute the expensive end component decomposition for each deflating step. BRTDP performs well for cloud, because in this model, as well as generally often if there are many MECs [BCC+14], only a small part of the state space is relevant for convergence. For the other models, BRTDP is slower than the deterministic approaches, because the models are so small that it is faster to first construct them completely than to explore them by simulation.

Our more extensive experiments on MDPs compare the guaranteed approaches based on collapsing (i.e. learning-based from [BCC+14] and deterministic from [HM17]) to our guaranteed approaches based on deflating (so BRTDP and BVI). Since both learning-based approaches as well as both deterministic approaches perform similarly (see Table 2 in [KKKW18, Appendix B]), we conclude that collapsing and deflating are both useful for practical purposes, while the latter is also applicable to SGs. Furthermore we compared the usual unguaranteed value iteration of PRISM's explicit engine to BVI and saw that our guaranteed approach did not take significantly more time in most cases. This strengthens the point that the overhead for the computation of the guarantees is negligible.

#### **6 Conclusions**

We have provided the first stopping criterion for value iteration on simple stochastic games and an anytime algorithm with bounds on the current error (guarantees on the precision of the result). The main technical challenge was that states in end components in SG can have different values, in contrast to the case of MDP. We have shown that collapsing is in general not possible, but we utilized the analysis to obtain the procedure of *deflating*, a solution on the original graph. Besides, whenever a SEC is identified for sure it can be collapsed and the two techniques of collapsing and deflating can thus be combined.

The experiments indicate that the price to pay for the overhead to compute the error bound is often negligible. For each of the available models, at least one of our two implementations has performed similar to or better than the standard approach that yields no guarantees. Further, the obtained guarantees open the door to (e.g. learning-based) heuristics which treat only a part of the state space and can thus potentially lead to huge improvements. Surprisingly, already our straightforward adaptation of such an algorithm for MDP to SG yields interesting results, palliating the overhead of our non-learning method, despite the most naive implementation of deflating. Future work could reveal whether other heuristics or more efficient implementation can lead to huge savings as in the case of MDP [BCC+14].

#### **References**

	- [AM09] Andersson, D., Miltersen, P.B.: The complexity of solving stochastic games on graphs. In: Dong, Y., Du, D.-Z., Ibarra, O. (eds.) ISAAC 2009. LNCS, vol. 5878, pp. 112–121. Springer, Heidelberg (2009). https://doi.org/10. 1007/978-3-642-10631-6 13
	- [AY17] Arslan, G., Y¨uksel, S.: Decentralized Q-learning for stochastic teams and games. IEEE Trans. Autom. Control **62**(4), 1545–1558 (2017)
	- [BBS08] Busoniu, L., Babuska, R., De Schutter, B.: A comprehensive survey of multiagent reinforcement learning. IEEE Trans. Syst. Man Cybern. Part C **38**(2), 156–172 (2008)
	- [BT00] Brafman, R.I., Tennenholtz, M.: A near-optimal polynomial time algorithm for learning in certain classes of stochastic games. Artif. Intell. **121**(1–2), 31–47 (2000)
	- [CF11] Chatterjee, K., Fijalkow, N.: A reduction from parity games to simple stochastic games. In: GandALF, pp. 74–86 (2011)
	- [CH08] Chatterjee, K., Henzinger, T.A.: Value iteration. In: Grumberg, O., Veith, H. (eds.) 25 Years of Model Checking. LNCS, vol. 5000, pp. 107–138. Springer, Heidelberg (2008). https://doi.org/10.1007/978-3-540-69850-0 7
	- [CKJ12] Calinescu, R., Kikuchi, S., Johnson, K.: Compositional reverification of probabilistic safety properties for large-scale complex IT systems. In: Calinescu, R., Garlan, D. (eds.) Monterey Workshop 2012. LNCS, vol. 7539, pp. 303–329. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3- 642-34059-8 16
	- [CMG14] C´amara, J., Moreno, G.A., Garlan, D.: Stochastic game analysis and latency awareness for proactive self-adaptation. In: 9th International Symposium on Software Engineering for Adaptive and Self-Managing Systems, SEAMS 2014, Proceedings, Hyderabad, India, 2–3 June 2014, pp. 155–164 (2014)
		- [Con92] Condon, A.: The complexity of stochastic games. Inf. Comput. **96**(2), 203– 224 (1992)
		- [CY95] Courcoubetis, C., Yannakakis, M.: The complexity of probabilistic verification. J. ACM **42**(4), 857–907 (1995)
	- [gam] PRISM-games Case Studies. prismmodelchecker.org/games/casestudies.php. Accessed 18 Sept 2017
	- [HK66] Hoffman, A.J., Karp, R.M.: On nonterminating stochastic games. Manag. Sci. **12**(5), 359–370 (1966)
	- [HM17] Haddad, S., Monmege, B.: Interval iteration algorithm for MDPs and IMDPs. Theor. Comput. Sci. **735**, 111–131 (2018). https://doi.org/10. 1016/j.tcs.2016.12.003
	- [KM17] Kˇret´ınsk´y, J., Meggendorfer, T.: Efficient strategy iteration for mean payoff in Markov decision processes. In: D'Souza, D., Narayan Kumar, K. (eds.) ATVA 2017. LNCS, vol. 10482, pp. 380–399. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-68167-2 25
	- [KNP11] Kwiatkowska, M., Norman, G., Parker, D.: PRISM 4.0: verification of probabilistic real-time systems. In: Gopalakrishnan, G., Qadeer, S. (eds.) CAV 2011. LNCS, vol. 6806, pp. 585–591. Springer, Heidelberg (2011). https:// doi.org/10.1007/978-3-642-22110-1 47
	- [KNP12] Kwiatkowska, M., Norman, G., Parker, D.: The prism benchmark suite. In: 9th International Conference on Quantitative Evaluation of Systems (QEST 2012), pp. 203–204. IEEE (2012)
	- [LaV00] LaValle, S.M.: Robot motion planning: a game-theoretic foundation. Algorithmica **26**(3–4), 430–465 (2000)
		- [LL08] Li, J., Liu, W.: A novel heuristic Q-learning algorithm for solving stochastic games. In: IJCNN, pp. 1135–1144 (2008)
	- [Put14] Puterman, M.L.: Markov Decision Processes: Discrete Stochastic Dynamic Programming. Wiley, Hoboken (2014)
	- [SK16] Svorenov´a, M., Kwiatkowska, M.: Quantitative verification and strategy synthesis for stochastic games. Eur. J. Control **30**, 15–30 (2016)
	- [TT16] Tcheukam, A., Tembine, H.: One swarm per queen: a particle swarm learning for stochastic games. In: SASO, pp. 144–145 (2016)

**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

### **Sound Value Iteration**

Tim Quatmann(B) and Joost-Pieter Katoen

RWTH Aachen University, Aachen, Germany tim.quatmann@cs.rwth-aachen.de

**Abstract.** Computing reachability probabilities is at the heart of probabilistic model checking. All model checkers compute these probabilities in an iterative fashion using value iteration. This technique approximates a fixed point from below by determining reachability probabilities for an increasing number of steps. To avoid results that are significantly off, variants have recently been proposed that converge from both below and above. These procedures require starting values for both sides. We present an alternative that does not require the a priori computation of starting vectors and that converges faster on many benchmarks. The crux of our technique is to give tight and safe bounds—whose computation is cheap—on the reachability probabilities. Lifting this technique to expected rewards is trivial for both Markov chains and MDPs. Experimental results on a large set of benchmarks show its scalability and efficiency.

#### **1 Introduction**

Markov decision processes (MDPs) [1,2] have their roots in operations research and stochastic control theory. They are frequently used for stochastic and dynamic optimization problems and are widely applicable in, e.g., stochastic scheduling and robotics. MDPs are also a natural model in randomized distributed computing where coin flips by the individual processes are mixed with non-determinism arising from interleaving the processes' behaviors. The central problem for MDPs is to find a policy that determines what action to take in the light of what is known about the system at the time of choice. The typical aim is to optimize a given objective, such as minimizing the expected cost until a given number of repairs, maximizing the probability of being operational for 1,000 steps, or minimizing the probability to reach a "bad" state.

Probabilistic model checking [3,4] provides a scalable alternative to tackle these MDP problems, see the recent surveys [5,6]. The central computational issue in MDP model checking is to solve a system of linear inequalities. In absence of non-determinism—the MDP being a Markov Chain (MC)—a linear equation system is obtained. After appropriate pre-computations, such as determining the states for which no policy exists that eventually reaches the goal state, the (in)equation system has a unique solution that coincides with the extremal value

This work is partially supported by the Sino-German Center project CAP (GZ 1023). c The Author(s) 2018

<sup>-</sup>H. Chockler and G. Weissenbacher (Eds.): CAV 2018, LNCS 10981, pp. 643–661, 2018. https://doi.org/10.1007/978-3-319-96145-3\_37

that is sought for. Possible solution techniques to compute such solutions include policy iteration, linear programming, and value iteration. Modern probabilistic model checkers such as PRISM [7] and Storm [8] use value iteration by default. This approximates a fixed point from below by determining the probabilities to reach a target state within k steps in the k-th iteration. The iteration is typically stopped if the difference between the value vectors of two successive (or vectors that are further apart) is below the desired accuracy ε.

This procedure however can provide results that are significantly off, as the iteration is stopped prematurely, e.g., since the probability mass in the MDP only changes slightly in a series of computational steps due to a "slow" movement. This problem is not new; similar problems, e.g., occur in iterative approaches to compute long-run averages [9] and transient measures [10] and pop up in statistical model checking to decide when to stop simulating for unbounded reachability properties [11]. As recently was shown, this phenomenon does not only occur for hypothetical cases but affects practical benchmarks of MDP model checking too [12]. To remedy this, Haddad and Monmege [13] proposed to iteratively approximate the (unique) fixed point from both below and above; a natural termination criterion is to halt the computation once the two approximations differ less than 2·ε. This scheme requires two starting vectors, one for each approximation. For reachability probabilities, the conservative values zero and one can be used. For expected rewards, it is non-trivial to find an appropriate upper bound—how to "guess" an adequate upper bound to the expected reward to reach a goal state? Baier *et al.* [12] recently provided an algorithm to solve this issue.

This paper takes an alternative perspective to obtaining a sound variant of value iteration. *Our approach does not require the a priori computation of starting vectors and converges faster on many benchmarks.* The crux of our technique is to give tight and safe bounds—whose computation is cheap and that are obtained during the course of value iteration—on the reachability probabilities. The approach is simple and can be lifted straightforwardly to expected rewards. The central idea is to split the desired probability for reaching a target state into the sum of


We obtain (i) via k iterations of (standard) value iteration. A second instance of value iteration computes the probability that a target state is still reachable after k steps. We show that from this information safe lower and upper bounds for (ii) can be derived. We illustrate that the same idea can be applied to expected rewards, topological value iteration [14], and Gauss-Seidel value iteration. We also discuss in detail its extension to MDPs and provide extensive experimental evaluation using our implementation in the model checker Storm [8]. Our experiments show that on many practical benchmarks we need significantly fewer iterations, yielding a speed-up of about 20% on average. More importantly though, is the conceptual simplicity of our approach.

**Fig. 1.** Example models.

#### **2 Preliminaries**

For a finite set <sup>S</sup> and vector <sup>x</sup> <sup>∈</sup> <sup>R</sup>|S<sup>|</sup> , let <sup>x</sup>[s] <sup>∈</sup> <sup>R</sup> denote the entry of <sup>x</sup> that corresponds to <sup>s</sup> <sup>∈</sup> <sup>S</sup>. Let <sup>S</sup> <sup>⊆</sup> <sup>S</sup> and <sup>a</sup> <sup>∈</sup> <sup>R</sup>. We write <sup>x</sup>[S ] = a to denote that x[s] = a for all s ∈ S . Given x, y <sup>∈</sup> <sup>R</sup>|S<sup>|</sup> , x ≤ y holds iff x[s] ≤ y[s] holds for all <sup>s</sup> <sup>∈</sup> <sup>S</sup>. For a function <sup>f</sup> : <sup>R</sup>|S<sup>|</sup> <sup>→</sup> <sup>R</sup>|S<sup>|</sup> and <sup>k</sup> <sup>≥</sup> 0 we write <sup>f</sup> <sup>k</sup> for the function obtained by applying f k times, i.e., f <sup>0</sup>(x) = x and f <sup>k</sup>(x) = f(f <sup>k</sup>−<sup>1</sup>(x)) if k > 0.

#### **2.1 Probabilistic Models and Measures**

We briefly present probabilistic models and their properties. More details can be found in, e.g., [15].

**Definition 1 (Probabilistic Models).** *A* Markov Decision Process (MDP) *is a tuple* M = (S, *Act*, **P**, s<sup>I</sup> , ρ)*, where*

*–* S *is a finite set of states, Act is a finite set of actions,* s<sup>I</sup> *is the initial state, –* - **P**: S × *Act* × S → [0, 1] *is a transition probability function satisfying* s-<sup>∈</sup><sup>S</sup> **<sup>P</sup>**(s, α, s ) ∈ {0, 1} *for all* s ∈ S, α ∈ *Act , and –* <sup>ρ</sup>: <sup>S</sup> <sup>×</sup> *Act* <sup>→</sup> <sup>R</sup> *is a reward function.*

M *is a* Markov Chain (MC) *if* |*Act*| = 1*.*

*Example 1.* Figure 1 shows an example MC and an example MDP.

We often simplify notations for MCs by omitting the (unique) action. For an MDP M = (S, *Act*, **P**, s<sup>I</sup> , ρ), the set of *enabled actions* of state s ∈ S is given by *Act*(s) = {α ∈ *Act* | - s-<sup>∈</sup><sup>S</sup> **<sup>P</sup>**(s, α, s )=1}. We assume that *Act*(s) = ∅ for each s ∈ S. Intuitively, upon performing action α at state s reward ρ(s, α) is collected and with probability **P**(s, α, s ) we move to s ∈ S. Notice that rewards can be positive or negative.

A state s ∈ S is called *absorbing* if **P**(s, α, s) = 1 for every α ∈ *Act*(s). A *path* of M is an infinite alternating sequence π = s0α0s1α<sup>1</sup> ... where s<sup>i</sup> ∈ S, α<sup>i</sup> ∈ *Act*(si), and **P**(si, αi, si+1) > 0 for all i ≥ 0. The set of paths of M is denoted by *Paths*M. The set of paths that start at <sup>s</sup> <sup>∈</sup> <sup>S</sup> is given by *Paths*M,s. A *finite path* πˆ = s0α<sup>0</sup> ...αn−<sup>1</sup>s<sup>n</sup> is a finite prefix of a path ending with *last*(ˆπ) = s<sup>n</sup> ∈ S. |πˆ| = n is the length of ˆπ, *Paths*<sup>M</sup> *fin* is the set of finite paths of <sup>M</sup>, and *Paths*M,s *fin* is the set of finite paths that start at state s ∈ S. We consider LTL-like notations for sets of paths. For <sup>k</sup> <sup>∈</sup> <sup>N</sup> ∪ {∞} and G, H <sup>⊆</sup> <sup>S</sup> let

$$H\mathcal{U}^{\leq k}\ G = \{s\_0\alpha\_0s\_1\dots \in Path^{M,s\_I} \mid s\_0,\dots,s\_{j-1}\in H, s\_j \in G \text{ for some } j \leq k\}$$

denote the set of paths that, starting from the initial state s<sup>I</sup> , only visit states in <sup>H</sup> until after at most <sup>k</sup> steps a state in <sup>G</sup> is reached. Sets <sup>H</sup> <sup>U</sup>>k <sup>G</sup> and <sup>H</sup> <sup>U</sup><sup>=</sup><sup>k</sup> <sup>G</sup> are defined similarly. We use the shorthands ♦≤<sup>k</sup><sup>G</sup> := <sup>S</sup> <sup>U</sup><sup>≤</sup><sup>k</sup> <sup>G</sup>, ♦<sup>G</sup> := ♦≤∞G, and ≤<sup>k</sup><sup>G</sup> := *Paths*M,s<sup>I</sup> \ ♦≤<sup>k</sup>(<sup>S</sup> \ <sup>G</sup>).

A *(deterministic) scheduler* for M is a function σ : *Paths*<sup>M</sup> *fin* → *Act* such that σ(ˆπ) ∈ *Act*(*last*(ˆπ)) for all ˆπ ∈ *Paths*<sup>M</sup> *fin*. The set of (deterministic) schedulers for M is SM. σ ∈ S<sup>M</sup> is called *positional* if σ(ˆπ) only depends on the last state of ˆπ, i.e., for all ˆπ, πˆ ∈ *Paths*<sup>M</sup> *fin* we have *last*(ˆπ) = *last*(ˆπ ) implies σ(ˆπ) = σ(ˆπ ). For MDP M and scheduler σ ∈ S<sup>M</sup> the *probability measure* over finite paths is given by Pr<sup>M</sup>,σ *fin* : *Paths*<sup>M</sup>,s<sup>I</sup> *fin* → [0, 1] with Pr<sup>M</sup>,σ *fin* (s<sup>0</sup> ...sn) = <sup>n</sup>−<sup>1</sup> <sup>i</sup>=0 **<sup>P</sup>**(si, σ(s<sup>0</sup> ...si), si+1). The probability measure PrM,σ over measurable sets of infinite paths is obtained via a standard cylinder set construction [15].

**Definition 2 (Reachability Probability).** *The* reachability probability *of MDP* <sup>M</sup> = (S, *Act*, **<sup>P</sup>**, s<sup>I</sup> , ρ)*,* <sup>G</sup> <sup>⊆</sup> <sup>S</sup>*, and* <sup>σ</sup> <sup>∈</sup> <sup>S</sup><sup>M</sup> *is given by* PrM,σ(♦G)*.*

For <sup>k</sup> <sup>∈</sup> <sup>N</sup>∪{∞}, the function ≤<sup>k</sup>G: ♦<sup>G</sup> <sup>→</sup> <sup>R</sup> yields the <sup>k</sup>-bounded reachability reward of a path <sup>π</sup> <sup>=</sup> <sup>s</sup>0α0s<sup>1</sup> ···∈ ♦G. We set ≤<sup>k</sup>G(π) = j−1 <sup>i</sup>=0 ρ(si, αi), where j = min({i ≥ 0 | s<sup>i</sup> ∈ G}∪{k}). We write G instead of ≤∞G.

**Definition 3 (Expected Reward).** *The* expected (reachability) reward *of MDP* <sup>M</sup> = (S, *Act*, **<sup>P</sup>**, s<sup>I</sup> , ρ)*,* <sup>G</sup> <sup>⊆</sup> <sup>S</sup>*, and* <sup>σ</sup> <sup>∈</sup> <sup>S</sup><sup>M</sup> *with* PrM,σ(♦G)=1 *is given by the expectation* EM,σ(G) = <sup>π</sup>∈♦<sup>G</sup> G(π) dPrM,σ(π)*.*

We write PrM,σ <sup>s</sup> and EM,σ <sup>s</sup> for the probability measure and expectation obtained by changing the initial state of M to s ∈ S. If M is a Markov chain, there is only a single scheduler. In this case we may omit the superscript σ from PrM,σ and <sup>E</sup>M,σ. We also omit the superscript <sup>M</sup> if it is clear from the context. The maximal reachability probability of M and G is given by Prmax(♦G) = max<sup>σ</sup>∈S<sup>M</sup> Pr<sup>σ</sup>(♦G). There is a a positional scheduler that attains this maximum [16]. The same holds for minimal reachability probabilities and maximal or minimal expected rewards.

*Example 2.* Consider the MDP M from Fig. 1(b). We are interested in the maximal probability to reach state <sup>s</sup><sup>4</sup> given by Prmax(♦{s4}). Since <sup>s</sup><sup>4</sup> is not reachable from s<sup>3</sup> we have Prmax <sup>s</sup><sup>3</sup> (♦{s4}) = 0. Intuitively, choosing action β at state s<sup>0</sup> makes reaching s<sup>3</sup> more likely, which should be avoided in order to maximize the probability to reach s4. We therefore assume a scheduler σ that always chooses action α at state s0. Starting from the initial state s0, we then eventually take the transition from s<sup>2</sup> to s<sup>3</sup> or the transition from s<sup>2</sup> to s<sup>4</sup> with probability one. The resulting probability to reach s<sup>4</sup> is given by Prmax(♦{s4}) = Prσ(♦{s4})=0.3/(0.1+0.3) = 0.75.

#### **2.2 Probabilistic Model Checking via Interval Iteration**

In the following we present approaches to compute reachability probabilities and expected rewards. We consider approximative computations. Exact computations are handled in e.g. [17,18] For the sake of clarity, we focus on reachability probabilities and sketch how the techniques can be lifted to expected rewards.

**Reachability Probabilities.** We fix an MDP M = (S, *Act*, **P**, s<sup>I</sup> , ρ), a set of goal states G ⊆ S, and a precision parameter ε > 0.

*Problem 1.* Compute an ε-approximation of the maximal reachability probability Prmax(♦G), i.e., compute a value <sup>r</sup> <sup>∈</sup> [0, 1] with <sup>|</sup><sup>r</sup> <sup>−</sup> Prmax(♦G)<sup>|</sup> < ε.

We briefly sketch how to compute such a value r via *interval iteration* [12,13,19]. The computation for minimal reachability probabilities is analogous.

W.l.o.g. it is assumed that the states in G are absorbing. Using graph algorithms, we compute <sup>S</sup><sup>0</sup> <sup>=</sup> {<sup>s</sup> <sup>∈</sup> <sup>S</sup> <sup>|</sup> Prmax <sup>s</sup> (♦G)=0} and partition the state space of M into S = S<sup>0</sup> ∪· G ∪· S? with S? = S \ (G ∪ S0). If s<sup>I</sup> ∈ S<sup>0</sup> or s<sup>I</sup> ∈ G, the probability Prmax(♦G) is 0 or 1, respectively. From now on we assume <sup>s</sup><sup>I</sup> <sup>∈</sup> <sup>S</sup>?.

We say that <sup>M</sup> is *contracting* with respect to <sup>S</sup> <sup>⊆</sup> <sup>S</sup> if Pr<sup>σ</sup> <sup>s</sup> (♦S ) = 1 for all s ∈ S and for all σ ∈ SM. We assume that M is contracting with respect to <sup>G</sup> <sup>∪</sup> <sup>S</sup>0. Otherwise, we apply a transformation on the so-called *end components*<sup>1</sup> of M, yielding a contracting MDP M with the same maximal reachability probability as M. Roughly, this transformation replaces each end component of M with a single state whose enabled actions coincide with the actions that previously lead outside of the end component. This step is detailed in [13,19].

We have x∗[s] = Prmax <sup>s</sup> (♦G) for s ∈ S and the unique fixpoint x<sup>∗</sup> of the function <sup>f</sup> : <sup>R</sup>|S<sup>|</sup> <sup>→</sup> <sup>R</sup>|S<sup>|</sup> with <sup>f</sup>(x)[S0] = 0, <sup>f</sup>(x)[G] = 1, and

$$f(x)[s] = \max\_{\alpha \in Act(s)} \sum\_{s' \in S} \mathbf{P}(s, \alpha, s') \cdot x[s']$$

for <sup>s</sup> <sup>∈</sup> <sup>S</sup>?. Hence, computing Prmax(♦G) reduces to finding the fixpoint of <sup>f</sup>.

A popular technique for this purpose is the *value iteration* algorithm [1]. Given a starting vector <sup>x</sup> <sup>∈</sup> <sup>R</sup>|S<sup>|</sup> with <sup>x</sup>[S0] = 0 and <sup>x</sup>[G] = 1, standard value iteration computes <sup>f</sup> <sup>k</sup>(x) for increasing <sup>k</sup> until max<sup>s</sup>∈<sup>S</sup> <sup>|</sup><sup>f</sup> <sup>k</sup>(x)[s]−<sup>f</sup> <sup>k</sup>−<sup>1</sup>(x)[s]<sup>|</sup> < ε holds for a predefined precision ε > 0. As pointed out in, e.g., [13], there is no

<sup>1</sup> Intuitively, an end component is a set of states S- <sup>⊆</sup> <sup>S</sup> such that there is a scheduler inducing that from any <sup>s</sup> <sup>∈</sup> <sup>S</sup> exactly the states in Sare visited infinitely often.

guarantee on the preciseness of the result r = f <sup>k</sup>(x)[s<sup>I</sup> ], i.e., standard value iteration does not give any evidence on the error <sup>|</sup><sup>r</sup> <sup>−</sup> Prmax(♦G)|. The intuitive reason is that value iteration only approximates the fixpoint x<sup>∗</sup> from one side, yielding no indication on the distance between the current result and x∗.

*Example 3.* Consider the MDP M from Fig. 1(b). We invoked standard value iteration in PRISM [7] and Storm [8] to compute the reachability probability Prmax(♦{s4}). Recall from Example <sup>2</sup> that the correct solution is 0.75. With (absolute) precision ε = 10−<sup>6</sup> both model checkers returned 0.7248. Notice that the user can improve the precision by considering, e.g., ε = 10−<sup>8</sup> which yields 0.7497. However, there is no guarantee on the preciseness of a given result.

The *interval iteration* algorithm [12,13,19] addresses the impreciseness of value iteration. The idea is to approach the fixpoint x<sup>∗</sup> from below and from above. The first step is to find starting vectors <sup>x</sup>, x<sup>u</sup> <sup>∈</sup> <sup>R</sup>|S<sup>|</sup> satisfying <sup>x</sup>[S0] = xu[S0] = 0, x[G] = xu[G] = 1, and x ≤ x<sup>∗</sup> ≤ xu. As the entries of x<sup>∗</sup> are probabilities, it is always valid to set x[S?] = 0 and xu[S?] = 1. We have <sup>f</sup> <sup>k</sup>(x) <sup>≤</sup> <sup>x</sup><sup>∗</sup> <sup>≤</sup> <sup>f</sup> <sup>k</sup>(xu) for any <sup>k</sup> <sup>≥</sup> 0. Interval iteration computes <sup>f</sup> <sup>k</sup>(x) and <sup>f</sup> <sup>k</sup>(xu) for increasing <sup>k</sup> until max<sup>s</sup>∈<sup>S</sup> <sup>|</sup><sup>f</sup> <sup>k</sup>(x)[s] <sup>−</sup> <sup>f</sup> <sup>k</sup>(xu)[s]<sup>|</sup> <sup>&</sup>lt; <sup>2</sup>ε. For the result <sup>r</sup> <sup>=</sup> <sup>1</sup>/<sup>2</sup> · (<sup>f</sup> <sup>k</sup>(x)[s<sup>I</sup> ] + <sup>f</sup> <sup>k</sup>(xu)[s<sup>I</sup> ]) we obtain that <sup>|</sup><sup>r</sup> <sup>−</sup> Prmax(♦G)<sup>|</sup> < ε, i.e., we get a sound approximation of the maximal reachability probability.

*Example 4.* We invoked interval iteration in PRISM and Storm to compute the reachability probability Prmax(♦{s4}) for the MDP <sup>M</sup> from Fig. 1(b). Both implementations correctly yield an <sup>ε</sup>-approximation of Prmax(♦{s4}), where we considered ε = 10−<sup>6</sup>. However, both PRISM and Storm required roughly 300,000 iterations for convergence.

**Expected Rewards.** Whereas [13,19] only consider reachability probabilities, [12] extends interval iteration to compute expected rewards. Let M be an MDP and G be a set of absorbing states such that M is contracting with respect to G.

*Problem 2.* Compute an ε-approximation of the maximal expected reachability reward <sup>E</sup>max(G), i.e., compute a value <sup>r</sup> <sup>∈</sup> <sup>R</sup> with <sup>|</sup><sup>r</sup> <sup>−</sup> <sup>E</sup>max(G)<sup>|</sup> < ε.

We have x∗[s] = Emax <sup>s</sup> (G) for the unique fixpoint <sup>x</sup><sup>∗</sup> of <sup>g</sup> : <sup>R</sup>|S<sup>|</sup> <sup>→</sup> <sup>R</sup>|S<sup>|</sup> with

$$g(x)[G] = 0 \quad \text{and} \quad g(x)[s] = \max\_{\alpha \in Act(s)} \rho(s, \alpha) + \sum\_{s' \in S} \mathbf{P}(s, \alpha, s') \cdot x[s']$$

for s /∈ G. As for reachability probabilities, interval iteration can be applied to approximate this fixpoint. The crux lies in finding appropriate starting vectors <sup>x</sup>, x<sup>u</sup> <sup>∈</sup> <sup>R</sup>|S<sup>|</sup> guaranteeing <sup>x</sup> <sup>≤</sup> <sup>x</sup><sup>∗</sup> <sup>≤</sup> <sup>x</sup>u. To this end, [12] describe graph based algorithms that give an upper bound on the expected number of times each individual state s ∈ S \ G is visited. This then yields an approximation of the expected amount of reward collected at the various states.

#### **3 Sound Value Iteration for MCs**

We present an algorithm for computing reachability probabilities and expected rewards as in Problems 1 and 2. The algorithm is an alternative to the interval iteration approach [12,20] but (i) does not require an a priori computation of starting vectors <sup>x</sup>, x<sup>u</sup> <sup>∈</sup> <sup>R</sup>|S<sup>|</sup> and (ii) converges faster on many practical benchmarks as shown in Sect. 5. For the sake of simplicity, we first restrict to computing reachability probabilities on MCs.

In the following, let D = (S, **P**, s<sup>I</sup> , ρ) be an MC, G ⊆ S be a set of absorbing goal states and ε > 0 be a precision parameter. We consider the partition S = S<sup>0</sup> ∪· G∪· S? as in Sect. 2.2. The following theorem captures the key insight of our algorithm.

**Theorem 1.** *For MC* <sup>D</sup> *let* <sup>G</sup> *and* <sup>S</sup>? *be as above and* <sup>k</sup> <sup>≥</sup> <sup>0</sup> *with* Prs(≤<sup>k</sup>S?) <sup>&</sup>lt; 1 *for all* s ∈ S?*. We have*

$$\Pr(\Diamond^{\leq k}G) + \Pr(\Box^{\leq k}S\_{?}) \cdot \min\_{s \in S\_{?}} \frac{\Pr\_s(\Diamond^{\leq k}G)}{1 - \Pr\_s(\Box^{\leq k}S\_{?})}$$

$$\leq \Pr(\Diamond G) \leq \Pr(\Diamond^{\leq k}G) + \Pr(\Box^{\leq k}S\_{?}) \cdot \max\_{s \in S\_{?}} \frac{\Pr\_s(\Diamond^{\leq k}G)}{1 - \Pr\_s(\Box^{\leq k}S\_{?})}.$$

Theorem <sup>1</sup> allows us to approximate Pr(♦G) by computing for increasing <sup>k</sup> <sup>∈</sup> <sup>N</sup>

– Pr(♦≤<sup>k</sup>G), the probability to reach a state in G within k steps, and

– Pr(≤<sup>k</sup>S?), the probability to stay in S? during the first k steps.

This can be realized via a value-iteration based procedure. The obtained bounds on Pr(♦G) can be tightened arbitrarily since Pr(≤<sup>k</sup>S?) approaches 0 for increasing k. In the following, we address the correctness of Theorem 1, describe the details of our algorithm, and indicate how the results can be lifted to expected rewards.

#### **3.1 Approximating Reachability Probabilities**

To approximate the reachability probability Pr(♦G), we consider the step bounded reachability probability Pr(♦≤<sup>k</sup>G) for <sup>k</sup> <sup>≥</sup> 0 and provide a lower and an upper bound for the 'missing' probability Pr(♦G)−Pr(♦≤<sup>k</sup>G). Note that ♦<sup>G</sup> is the disjoint union of the paths that reach G *within* k steps (given by ♦≤<sup>k</sup>G) and the paths that reach <sup>G</sup> only *after* <sup>k</sup> steps (given by <sup>S</sup>? <sup>U</sup>>k <sup>G</sup>).

**Lemma 1.** *For any* <sup>k</sup> <sup>≥</sup> <sup>0</sup> *we have* Pr(♦G) = Pr(♦≤<sup>k</sup>G) + Pr(S? <sup>U</sup>>k <sup>G</sup>)*.*

A path <sup>π</sup> <sup>∈</sup> <sup>S</sup>? <sup>U</sup>>k <sup>G</sup> reaches some state <sup>s</sup> <sup>∈</sup> <sup>S</sup>? after *exactly* <sup>k</sup> steps. This yields the partition <sup>S</sup>? <sup>U</sup>>k <sup>G</sup> <sup>=</sup> ·<sup>s</sup>∈S? (S? <sup>U</sup><sup>=</sup><sup>k</sup>{s} ∩ ♦G). It follows that

$$\Pr(S\_{\colon}\mathcal{U}^{\geq k}\ G) = \sum\_{s\in S\_{\colon}} \Pr(S\_{\colon}\mathcal{U}^{=k}\{s\}) \cdot \Pr\_{s}(\Diamond G).$$

Consider , u ∈ [0, 1] with ≤ Prs(♦G) ≤ u for all s ∈ S?, i.e., and u are lower and upper bounds for the reachability probabilities within S?. We have

$$\sum\_{s \in S \colon \mathcal{}} \Pr(S\_{\mathcal{I}} \mathcal{U}^{=k} \{s\}) \cdot \Pr\_{s}(\lozenge G) \le \sum\_{s \in S \colon \mathcal{}} \Pr(S\_{\mathcal{I}} \mathcal{U}^{=k} \{s\}) \cdot u = \Pr(\square^{\le k} S\_{\mathcal{I}}) \cdot u.$$

We can argue similar for the lower bound . With *Lemma* 1 we get the following.

**Proposition 1.** *For MC* D *with* G*,* S?*, ,* u *as above and any* k ≥ 0 *we have*

$$\Pr(\Diamond^{\leq k}G) + \Pr(\Box^{\leq k}S\_?) \cdot \ell \leq \Pr(\Diamond G) \leq \Pr(\Diamond^{\leq k}G) + \Pr(\Box^{\leq k}S\_?) \cdot u.$$

*Remark 1.* The bounds for Pr(♦G) given by Proposition 1 are similar to the bounds obtained after performing k iterations of interval iteration with starting vectors <sup>x</sup>, x<sup>u</sup> <sup>∈</sup> <sup>R</sup>|S<sup>|</sup> , where x[S?] = and xu[S?] = u.

We now discuss how the bounds and u can be obtained from the step bounded probabilities Prs(♦≤<sup>k</sup>G) and Prs(≤<sup>k</sup>S?) for <sup>s</sup> <sup>∈</sup> <sup>S</sup>?. We focus on the upper bound u. The reasoning for the lower bound is similar.

Let smax ∈ S? be a state with maximal reachability probability, that is smax ∈ arg max<sup>s</sup>∈S?Prs(♦G). From Proposition 1 we get

$$\operatorname{Pr}\_{s\_{\max}}(\Diamond G) \le \operatorname{Pr}\_{s\_{\max}}(\Diamond^{\leq k} G) + \operatorname{Pr}\_{s\_{\max}}(\Box^{\leq k} S\_?) \cdot \operatorname{Pr}\_{s\_{\max}}(\Diamond G).$$

We solve the inequality for Pr<sup>s</sup>max (♦G) (assuming Prs(≤<sup>k</sup>S?) < 1 for all s ∈ S?):

$$\Pr\_{s\_{\text{max}}}(\Diamond G) \le \frac{\Pr\_{s\_{\text{max}}}(\Diamond^{\leq k} G)}{1 - \Pr\_{s\_{\text{max}}}(\Box^{\leq k} S\_?)} \le \max\_{s \in S\_?} \frac{\Pr\_s(\Diamond^{\leq k} G)}{1 - \Pr\_s(\Box^{\leq k} S\_?)}.$$

**Proposition 2.** *For MC* D *let* G *and* S? *be as above and* k ≥ 0 *such that* Prs(≤<sup>k</sup>S?) <sup>&</sup>lt; <sup>1</sup> *for all* <sup>s</sup> <sup>∈</sup> <sup>S</sup>?*. For every* <sup>s</sup><sup>ˆ</sup> <sup>∈</sup> <sup>S</sup>? *we have*

$$\min\_{s \in S \colon} \frac{\operatorname{Pr}\_s(\Diamond^{\leq k} G)}{1 - \operatorname{Pr}\_s(\Box^{\leq k} S\_?)} \leq \operatorname{Pr}\_{\check{s}}(\Diamond G) \leq \max\_{s \in S \colon} \frac{\operatorname{Pr}\_s(\Diamond^{\leq k} G)}{1 - \operatorname{Pr}\_s(\Box^{\leq k} S\_?)}.$$

Theorem 1 is a direct consequence of Propositions 1 and 2.

#### **3.2 Extending the Value Iteration Approach**

Recall the standard value iteration algorithm for approximating Pr(♦G) as discussed in Sect. 2.2. The function <sup>f</sup> : <sup>R</sup>|S<sup>|</sup> <sup>→</sup> <sup>R</sup>|S<sup>|</sup> for MCs simplifies to f(x)[S0] = 0, f(x)[G] = 1, and f(x)[s] = - s-<sup>∈</sup><sup>S</sup> **<sup>P</sup>**(s, s ) · x[s ] for s ∈ S?. We can compute the k-step bounded reachability probability at every state s ∈ S

**Input :** MC <sup>D</sup> = (S, **<sup>P</sup>**, s<sup>I</sup> , ρ), absorbing states <sup>G</sup> <sup>⊆</sup> <sup>S</sup>, precision ε > <sup>0</sup> **Output :** <sup>r</sup> <sup>∈</sup> <sup>R</sup> with <sup>|</sup><sup>r</sup> <sup>−</sup> Pr(♦G)<sup>|</sup> < ε **<sup>1</sup>** <sup>S</sup>? <sup>←</sup> <sup>S</sup> \ - {<sup>s</sup> <sup>∈</sup> <sup>S</sup> <sup>|</sup> Prs(♦G)=0} ∪ <sup>G</sup> **<sup>2</sup>** initialize <sup>x</sup>0, y<sup>0</sup> <sup>∈</sup> <sup>R</sup>|S<sup>|</sup> with <sup>x</sup>0[G] = 1, <sup>x</sup>0[<sup>S</sup> \ <sup>G</sup>] = 0, <sup>y</sup>0[S?] = 1, <sup>y</sup>0[<sup>S</sup> \ <sup>S</sup>?]=0 **<sup>3</sup>** <sup>0</sup> ← −∞; <sup>u</sup><sup>0</sup> <sup>←</sup> <sup>+</sup><sup>∞</sup> **<sup>4</sup>** <sup>k</sup> <sup>←</sup> <sup>0</sup> **5 repeat <sup>6</sup>** <sup>k</sup> <sup>←</sup> <sup>k</sup> + 1 **<sup>7</sup>** <sup>x</sup><sup>k</sup> <sup>←</sup> <sup>f</sup>(x<sup>k</sup>−<sup>1</sup>); <sup>y</sup><sup>k</sup> <sup>←</sup> <sup>h</sup>(y<sup>k</sup>−<sup>1</sup>) **<sup>8</sup> if** <sup>y</sup>k[s] <sup>&</sup>lt; 1 for all <sup>s</sup> <sup>∈</sup> <sup>S</sup>? **then <sup>9</sup>** <sup>k</sup> <sup>←</sup> max(<sup>k</sup>−<sup>1</sup>, min<sup>s</sup>∈S? xk[s] <sup>1</sup>−yk[s] ); <sup>u</sup><sup>k</sup> <sup>←</sup> min(u<sup>k</sup>−<sup>1</sup>, max<sup>s</sup>∈S? xk[s] <sup>1</sup>−yk[s] ) **<sup>10</sup> until** <sup>y</sup>k[s<sup>I</sup> ] · (u<sup>k</sup> <sup>−</sup> k) <sup>&</sup>lt; <sup>2</sup> · <sup>ε</sup> **<sup>11</sup> return** <sup>x</sup>k[s<sup>I</sup> ] + <sup>y</sup>k[s<sup>I</sup> ] · <sup>k</sup>+u<sup>k</sup> 2

```
Algorithm 1: Sound value iteration for MCs.
```
by performing k iterations of value iteration [15, Remark 10.104]. More precisely, when applying f k times on starting vector <sup>x</sup> <sup>∈</sup> <sup>R</sup>|S<sup>|</sup> with <sup>x</sup>[G] = 1 and <sup>x</sup>[<sup>S</sup> \ <sup>G</sup>] = 0 we get Prs(♦≤<sup>k</sup>G) = <sup>f</sup> <sup>k</sup>(x)[s]. The probabilities Prs(≤<sup>k</sup>S?) for <sup>s</sup> <sup>∈</sup> <sup>S</sup> can be computed similarly. Let <sup>h</sup>: <sup>R</sup>|S<sup>|</sup> <sup>→</sup> <sup>R</sup>|S<sup>|</sup> with <sup>h</sup>(y)[<sup>S</sup> \ <sup>S</sup>?] = 0 and h(y)[s] = - s-<sup>∈</sup><sup>S</sup> **<sup>P</sup>**(s, s ) · y[s ] for <sup>s</sup> <sup>∈</sup> <sup>S</sup>?. For starting vector <sup>y</sup> <sup>∈</sup> <sup>R</sup>|S<sup>|</sup> with <sup>y</sup>[S?] = 1 and <sup>y</sup>[<sup>S</sup> \ <sup>S</sup>?] = 0 we get Prs(≤<sup>k</sup>S?) = <sup>h</sup><sup>k</sup>(y)[s].

Algorithm <sup>1</sup> depicts our approach. It maintains vectors <sup>x</sup>k, y<sup>k</sup> <sup>∈</sup> <sup>R</sup>|S<sup>|</sup> which, after k iterations of the loop, store the k-step bounded probabilities Prs(♦≤<sup>k</sup>G) and Prs(≤<sup>k</sup>S?), respectively. Additionally, the algorithm considers lower bounds <sup>k</sup> and upper bounds u<sup>k</sup> such that the following invariant holds.

**Lemma 2.** *After executing the loop of Algorithm 1* k *times we have for all* s ∈ S? *that* <sup>x</sup>k[s] = Prs(♦≤<sup>k</sup>G)*,* <sup>y</sup>k[s] = Prs(≤<sup>k</sup>S?)*, and* <sup>k</sup> <sup>≤</sup> Prs(♦G) <sup>≤</sup> <sup>u</sup>k*.*

The correctness of the algorithm follows from Theorem 1. Termination is guaranteed since Pr(♦(S<sup>0</sup> <sup>∪</sup> <sup>G</sup>)) = 1 and therefore lim<sup>k</sup>→∞ Pr(≤<sup>k</sup>S?) = Pr(S?) = 0.

**Theorem 2.** *Algorithm 1 terminates for any MC* D*, goal states* G*, and precision* ε > 0*. The returned value* r *satisfies* |r − Pr(♦G)| < ε*.*

*Example 5.* We apply Algorithm 1 for the MC in Fig. 1(a) and the set of goal states G = {s4}. We have S? = {s0, s1, s2}. After k = 3 iterations it holds that

$$\begin{aligned} x\_3[s\_0] &= 0.00003 & x\_3[s\_1] &= 0.003 & x\_3[s\_2] &= 0.3\\ y\_3[s\_0] &= 0.99996 & y\_3[s\_1] &= 0.996 & y\_3[s\_2] &= 0.6 \end{aligned}$$

Hence, <sup>x</sup>3[s] <sup>1</sup>−y3[s] <sup>=</sup> <sup>3</sup> <sup>4</sup> = 0.75 for all s ∈ S?. We get <sup>3</sup> = u<sup>3</sup> = 0.75. The algorithm converges for any ε > 0 and returns the correct solution x3[s0] + y3[s0] · 0.75 = 0.75.

#### **3.3 Sound Value Iteration for Expected Rewards**

We lift our approach to expected rewards in a straightforward manner. Let G ⊆ S be a set of absorbing goal states of MC D such that Pr(♦G) = 1. Further let S? = <sup>S</sup>\G. For <sup>k</sup> <sup>≥</sup> 0 we observe that the expected reward <sup>E</sup>(G) can be split into the expected reward collected within k steps and the expected reward collected only after k steps, i.e., E(G) = E(≤kG)+- <sup>s</sup>∈S? Pr(S? <sup>U</sup><sup>=</sup>k{s})·Es(G). Following a similar reasoning as in Sect. 3.1 we can show the following.

**Theorem 3.** *For MC* D *let* G *and* S? *be as before and* k ≥ 0 *such that* Prs(≤<sup>k</sup>S?) <sup>&</sup>lt; <sup>1</sup> *for all* <sup>s</sup> <sup>∈</sup> <sup>S</sup>?*. We have*

$$\begin{aligned} \mathbb{E}(\boldsymbol{\Phi}^{\leq k}\boldsymbol{G}) + \Pr(\Box^{\leq k}\boldsymbol{S}\_{?}) \cdot \min\_{s\in S\colon \boldsymbol{\cdot}} & \frac{\mathbb{E}\_{s}(\boldsymbol{\Phi}^{\leq k}\boldsymbol{G})}{1 - \Pr\_{s}(\Box^{\leq k}\boldsymbol{S}\_{?})} \\ \leq \mathbb{E}(\boldsymbol{\Phi}\boldsymbol{G}) \leq \mathbb{E}(\boldsymbol{\Phi}^{\leq k}\boldsymbol{G}) + \Pr(\Box^{\leq k}\boldsymbol{S}\_{?}) \cdot \max\_{s\in S\colon \boldsymbol{\cdot}} & \frac{\mathbb{E}\_{s}(\boldsymbol{\Phi}^{\leq k}\boldsymbol{G})}{1 - \Pr\_{s}(\Box^{\leq k}\boldsymbol{S}\_{?})} .\end{aligned}$$

Recall the function <sup>g</sup> : <sup>R</sup>|S<sup>|</sup> <sup>→</sup> <sup>R</sup>|S<sup>|</sup> from Sect. 2.2, given by <sup>g</sup>(x)[G] = 0 and g(x)[s] = ρ(s) + - s-<sup>∈</sup><sup>S</sup> **<sup>P</sup>**(s, s ) · x[s ] for <sup>s</sup> <sup>∈</sup> <sup>S</sup>?. For <sup>s</sup> <sup>∈</sup> <sup>S</sup> and <sup>x</sup> <sup>∈</sup> <sup>R</sup>|S<sup>|</sup> with x[S] = 0 we have Es(≤<sup>k</sup>G) = g<sup>k</sup>(x)[s]. We modify Algorithm 1 such that it considers function g instead of function f. Then, the returned value r satisfies <sup>|</sup><sup>r</sup> <sup>−</sup> <sup>E</sup>(G)<sup>|</sup> < ε.

#### **3.4 Optimizations**

Algorithm <sup>1</sup> can make use of *initial bounds* 0, u<sup>0</sup> <sup>∈</sup> <sup>R</sup> with <sup>0</sup> <sup>≤</sup> Prs(♦G) <sup>≤</sup> <sup>u</sup><sup>0</sup> for all s ∈ S?. Such bounds could be derived, e.g., from domain knowledge or during preprocessing [12]. The algorithm always chooses the largest available lower bound for <sup>k</sup> and the lowest available upper bound for uk, respectively. If Algorithm 1 and interval iteration are initialized with the same bounds, Algorithm 1 always requires as most as many iterations compared to interval iteration (cf. Remark 1).

*Gauss-Seidel value iteration* [1,12] is an optimization for standard value iteration and interval iteration that potentially leads to faster convergence. When computing f(x)[s] for s ∈ S?, the idea is to consider already computed results f(x)[s ] from the current iteration. Formally, let ≺ ⊆ S × S be some strict total ordering of the states. Gauss-Seidel value iteration considers instead of function <sup>f</sup> the function <sup>f</sup><sup>≺</sup> : <sup>R</sup>|S<sup>|</sup> <sup>→</sup> <sup>R</sup>|S<sup>|</sup> with <sup>f</sup>≺[S0] = 0, <sup>f</sup>≺[G] = 1, and

$$f\_{\prec}(x)[s] = \sum\_{s' \prec s} \mathbf{P}(s, s') \cdot f\_{\prec}(x)[s'] + \sum\_{s' \prec s} \mathbf{P}(s, s') \cdot x[s'].$$

Values f≺(x)[s] for s ∈ S are computed in the order defined by ≺. This idea can also be applied to our approach. To this end, we replace f by f<sup>≺</sup> and h by h≺, where h<sup>≺</sup> is defined similarly. More details are given in [21].

*Topological value iteration* [14] employs the graphical structure of the MC D. The idea is to decompose the states <sup>S</sup> of <sup>D</sup> into strongly connected components<sup>2</sup>

<sup>2</sup> S- <sup>⊆</sup> <sup>S</sup> is a connected component if <sup>s</sup> can be reached from <sup>s</sup> for all s, s- <sup>∈</sup> <sup>S</sup>- . S is a strongly connected component if no superset of Sis a connected component.

(SCCs) that are analyzed individually. The procedure can improve the runtime of classical value iteration since for a single iteration only the values for the current SCC have to be updated. A topological variant of interval iteration is introduced in [12]. Given these results, sound value iteration can be extended similarly.

#### **4 Sound Value Iteration for MDPs**

We extend sound value iteration to compute reachability probabilities in MDPs. Assume an MDP M = (S, *Act*, **P**, s<sup>I</sup> , ρ) and a set of absorbing goal states G. For simplicity, we focus on maximal reachability probabilities, i.e., we compute Prmax(♦G). Minimal reachability probabilities and expected rewards are analogous. As in Sect. 2.2 we consider the partition S = S<sup>0</sup> ∪· G ∪· S? such that M is contracting with respect to G ∪ S0.

#### **4.1 Approximating Maximal Reachability Probabilities**

We argue that our results for MCs also hold for MDPs under a given scheduler <sup>σ</sup> <sup>∈</sup> <sup>S</sup>M. Let <sup>k</sup> <sup>≥</sup> 0 such that Pr<sup>σ</sup> <sup>s</sup> (≤<sup>k</sup>S?) <sup>&</sup>lt; 1 for all <sup>s</sup> <sup>∈</sup> <sup>S</sup>?. Following the reasoning as in Sect. 3.1 we get

$$\Pr^{\sigma}(\Diamond^{\leq k}G) + \Pr^{\sigma}(\Box^{\leq k}S\_?) \cdot \min\_{s \in S\_?} \frac{\Pr^{\sigma}\_s(\Diamond^{\leq k}G)}{1 - \Pr^{\sigma}\_s(\Box^{\leq k}S\_?)} \leq \Pr^{\sigma}(\Diamond G) \leq \Pr^{\max}(\Diamond G).$$

Next, assume an upper bound <sup>u</sup> <sup>∈</sup> <sup>R</sup> with Prmax <sup>s</sup> (♦G) ≤ u for all s ∈ S?. For a scheduler σmax ∈ S<sup>M</sup> that attains the maximal reachability probability, i.e., <sup>σ</sup>max <sup>∈</sup> arg max<sup>σ</sup>∈SMPr<sup>σ</sup>(♦G) it holds that

$$\begin{split} \Pr^{\max}(\Diamond G) &= \Pr^{\sigma\_{\max}}(\Diamond G) \leq \Pr^{\sigma\_{\max}}(\Diamond^{\leq k} G) + \Pr^{\sigma\_{\max}}(\Box^{\leq k} S\_?) \cdot u \\ &\leq \max\_{\sigma \in \mathfrak{S}^{\mathcal{M}}} \left( \Pr^{\sigma}(\Diamond^{\leq k} G) + \Pr^{\sigma}(\Box^{\leq k} S\_?) \cdot u \right) .\end{split}$$

We obtain the following theorem which is the basis of our algorithm.

**Theorem 4.** *For MDP* M *let* G*,* S?*, and* u *be as above. Assume* σ ∈ S<sup>M</sup> *and* k ≥ 0 *such that* σ ∈ arg max<sup>σ</sup>-∈SMPr<sup>σ</sup>- (♦≤<sup>k</sup>G) + Pr<sup>σ</sup>- (≤<sup>k</sup>S?) · <sup>u</sup> *and* Pr<sup>σ</sup> <sup>s</sup> (≤<sup>k</sup>S?) <sup>&</sup>lt; <sup>1</sup> *for all* <sup>s</sup> <sup>∈</sup> <sup>S</sup>?*. We have*

$$\begin{aligned} &\Pr^{\sigma}(\Diamond^{\leq k}G) + \Pr^{\sigma}(\Box^{\leq k}S\_{?}) \cdot \min\_{s \in S; \; 1 = \Pr^{\sigma}(\Box^{\leq k}S\_{?})} \frac{\Pr^{\sigma}(\Diamond^{\leq k}G)}{1 - \Pr^{\sigma}(\Box^{\leq k}S\_{?})} \\ &\leq \Pr^{\max}(\Diamond G) \leq \Pr^{\sigma}(\Diamond^{\leq k}G) + \Pr^{\sigma}(\Box^{\leq k}S\_{?}) \cdot u. \end{aligned}$$

Similar to the results for MCs it also holds that Prmax(♦G) <sup>≤</sup> max<sup>σ</sup>∈S<sup>M</sup> <sup>u</sup>ˆ<sup>σ</sup> k with

$$
\hat{u}\_k^{\sigma} := \operatorname{Pr}^{\sigma}(\Diamond^{\leq k} G) + \operatorname{Pr}^{\sigma}(\Box^{\leq k} S\_?) \cdot \max\_{s \in S; \, } \frac{\operatorname{Pr}\_s^{\sigma}(\Diamond^{\leq k} G)}{1 - \operatorname{Pr}\_s^{\sigma}(\Box^{\leq k} S\_?)}.
$$

**Fig. 2.** Example MDP with corresponding step bounded probabilities.

However, this upper bound can not trivially be embedded in a value iteration based procedure. Intuitively, in order to compute the upper bound for iteration k, one can not necessarily build on the results for iteration k − 1.

*Example 6.* Consider the MDP M given in Fig. 2(a). Let G = {s3, s4} be the set of goal states. We therefore have S? = {s0, s1, s2}. In Fig. 2(b) we list step bounded probabilities with respect to the possible schedulers, where σα, σβα, and σββ refer to schedulers with σα(s0) = α and for γ ∈ {α, β}, σβγ(s0) = β and σβγ(s0βs0) = γ. Notice that the probability measures Pr<sup>σ</sup> <sup>s</sup><sup>1</sup> and Pr<sup>σ</sup> <sup>s</sup><sup>2</sup> are independent of the considered scheduler σ. For step bounds k ∈ {1, 2} we get

– max<sup>σ</sup>∈S<sup>M</sup> <sup>u</sup>ˆ<sup>σ</sup> <sup>1</sup> = ˆu<sup>σ</sup><sup>α</sup> <sup>1</sup> =0+0.8 · max(0, 1, 0) = 0.8 and – max<sup>σ</sup>∈S<sup>M</sup> <sup>u</sup>ˆ<sup>σ</sup> <sup>2</sup> = ˆu<sup>σ</sup>ββ <sup>2</sup> = 0.42 + 0.16 · max(0.5, 0.19, 1) = 0.5.

#### **4.2 Extending the Value Iteration Approach**

The idea of our algorithm is to compute the bounds for Prmax(♦G) as in Theorem 4 for increasing k ≥ 0. Algorithm 2 outlines the procedure. Similar to Algorithm <sup>1</sup> for MCs, vectors <sup>x</sup>k, y<sup>k</sup> <sup>∈</sup> <sup>R</sup>|S<sup>|</sup> store the step bounded probabilities Pr<sup>σ</sup><sup>k</sup> <sup>s</sup> (♦≤<sup>k</sup>G) and Pr<sup>σ</sup><sup>k</sup> <sup>s</sup> (≤<sup>k</sup>S?) for any <sup>s</sup> <sup>∈</sup> <sup>S</sup>. In addition, schedulers <sup>σ</sup><sup>k</sup> and upper bounds <sup>u</sup><sup>k</sup> <sup>≥</sup> max<sup>s</sup>∈S? Prmax <sup>s</sup> (♦G) are computed in a way that Theorem 4 is applicable.

**Lemma 3.** *After executing* k *iterations of Algorithm 2 we have for all* s ∈ S? *that* xk[s] = Pr<sup>σ</sup><sup>k</sup> <sup>s</sup> (♦≤<sup>k</sup>G)*,* yk[s] = Pr<sup>σ</sup><sup>k</sup> <sup>s</sup> (≤<sup>k</sup>S?)*, and* <sup>k</sup> <sup>≤</sup> Prmax <sup>s</sup> (♦G) ≤ uk*, where* <sup>σ</sup><sup>k</sup> <sup>∈</sup> arg max<sup>σ</sup>∈SMPr<sup>σ</sup> <sup>s</sup> (♦≤<sup>k</sup>G) + Pr<sup>σ</sup> <sup>s</sup> (≤<sup>k</sup>S?) · <sup>u</sup>k*.*

The lemma holds for k = 0 as x0, y0, and u<sup>0</sup> are initialized accordingly. For k > 0 we assume that the claim holds after k − 1 iterations, i.e., for x<sup>k</sup>−<sup>1</sup>, y<sup>k</sup>−<sup>1</sup>, u<sup>k</sup>−<sup>1</sup> and scheduler σ<sup>k</sup>−<sup>1</sup>. The results of the kth iteration are obtained as follows.

The function *findAction* illustrated in Algorithm 3 determines the choices of a scheduler <sup>σ</sup><sup>k</sup> <sup>∈</sup> arg max<sup>σ</sup>∈SMPr<sup>σ</sup> <sup>s</sup> (♦≤<sup>k</sup>G) + Pr<sup>σ</sup> <sup>s</sup> (≤<sup>k</sup>S?) · <sup>u</sup><sup>k</sup>−<sup>1</sup> for <sup>s</sup> <sup>∈</sup> <sup>S</sup>?. The idea is to consider at state s an action σk(s) = α ∈ *Act*(s) that maximizes

$$\Pr\_s^{\sigma\_k}(\Diamond^{\leq k} G) + \Pr\_s^{\sigma\_k}(\Box^{\leq k} S\_?) \cdot u\_{k-1} = \sum\_{s' \in S} \mathbf{P}(s, \alpha, s') \cdot (x\_{k-1}[s'] + y\_{k-1}[s'] \cdot u\_{k-1}).$$

**Input :** MDP <sup>M</sup> = (S, *Act*, **<sup>P</sup>**, s<sup>I</sup> , ρ), absorbing states <sup>G</sup> <sup>⊆</sup> <sup>S</sup>, precision ε > <sup>0</sup> **Output :** <sup>r</sup> <sup>∈</sup> <sup>R</sup> with <sup>|</sup><sup>r</sup> <sup>−</sup> Prmax(♦G)<sup>|</sup> < ε **<sup>1</sup>** <sup>S</sup><sup>0</sup> ← {<sup>s</sup> <sup>∈</sup> <sup>S</sup> <sup>|</sup> Prmax <sup>s</sup> (♦G)=0} **<sup>2</sup>** assert that <sup>M</sup> is contracting with respect to <sup>G</sup> <sup>∪</sup> <sup>S</sup><sup>0</sup> **<sup>3</sup>** <sup>S</sup>? <sup>←</sup> <sup>S</sup> \ (S<sup>0</sup> <sup>∪</sup> <sup>G</sup>) **<sup>4</sup>** initialize <sup>x</sup>0, y<sup>0</sup> <sup>∈</sup> <sup>R</sup>|S<sup>|</sup> with <sup>x</sup>0[G] = 1, <sup>x</sup>0[<sup>S</sup> \ <sup>G</sup>] = 0, <sup>y</sup>0[S?] = 1, <sup>y</sup>0[<sup>S</sup> \ <sup>S</sup>?]=0 **<sup>5</sup>** <sup>0</sup> ← −∞; <sup>u</sup><sup>0</sup> <sup>←</sup> <sup>+</sup>∞; <sup>d</sup><sup>0</sup> ← −∞ **<sup>6</sup>** <sup>k</sup> <sup>←</sup> <sup>0</sup> **7 repeat <sup>8</sup>** <sup>k</sup> <sup>←</sup> <sup>k</sup> + 1 **<sup>9</sup>** initialize <sup>x</sup>k, y<sup>k</sup> <sup>∈</sup> <sup>R</sup>|S<sup>|</sup> with <sup>x</sup>k[G] = 1, <sup>x</sup>k[S0] = 0, <sup>y</sup>k[<sup>S</sup> \ <sup>S</sup>?]=0 **<sup>10</sup>** <sup>d</sup><sup>k</sup> <sup>←</sup> <sup>d</sup><sup>k</sup>−<sup>1</sup> **<sup>11</sup> foreach** <sup>s</sup> <sup>∈</sup> <sup>S</sup>? **do <sup>12</sup>** <sup>α</sup> <sup>←</sup> *findAction*(x<sup>k</sup>−<sup>1</sup>, y<sup>k</sup>−<sup>1</sup>, s, u<sup>k</sup>−<sup>1</sup>) **<sup>13</sup>** <sup>d</sup><sup>k</sup> <sup>←</sup> max(dk, *decisionValue*(x<sup>k</sup>−<sup>1</sup>, y<sup>k</sup>−<sup>1</sup>, s, α)) **<sup>14</sup>** <sup>x</sup>k[s] <sup>←</sup> s-<sup>∈</sup><sup>S</sup> **<sup>P</sup>**(s, α, s- ) · <sup>x</sup><sup>k</sup>−<sup>1</sup>[s- ] **<sup>15</sup>** <sup>y</sup>k[s] <sup>←</sup> s-<sup>∈</sup><sup>S</sup> **<sup>P</sup>**(s, α, s- ) · <sup>y</sup><sup>k</sup>−<sup>1</sup>[s- ] **<sup>16</sup> if** <sup>y</sup>k[s] <sup>&</sup>lt; 1 for all <sup>s</sup> <sup>∈</sup> <sup>S</sup>? **then <sup>17</sup>** <sup>k</sup> <sup>←</sup> max(<sup>k</sup>−<sup>1</sup>, min<sup>s</sup>∈S? xk[s] <sup>1</sup>−yk[s] ) **<sup>18</sup>** <sup>u</sup><sup>k</sup> <sup>←</sup> min(u<sup>k</sup>−<sup>1</sup>, max(dk, max,∈S? xk[s] <sup>1</sup>−yk[s] )) **<sup>19</sup> until** <sup>y</sup>k[s<sup>I</sup> ] · (u<sup>k</sup> <sup>−</sup> k) <sup>&</sup>lt; <sup>2</sup> · <sup>ε</sup> **<sup>20</sup> return** <sup>x</sup>k[s<sup>I</sup> ] + <sup>y</sup>k[s<sup>I</sup> ] · <sup>k</sup>+u<sup>k</sup> 2

#### **Algorithm 2:** Sound value iteration for MDPs

For the case where no real upper bound is known (i.e., u<sup>k</sup>−<sup>1</sup> = ∞) we implicitly assume a sufficiently large value for <sup>u</sup><sup>k</sup>−<sup>1</sup> such that Pr<sup>σ</sup> <sup>s</sup> (♦≤<sup>k</sup>G) becomes negligible. Upon leaving state s, σ<sup>k</sup> mimics σ<sup>k</sup>−<sup>1</sup>, i.e., we set σk(sαs1α<sup>1</sup> ...sn) = σ<sup>k</sup>−<sup>1</sup>(s1α<sup>1</sup> ...sn). After executing Line 15 of Algorithm 2 we have xk[s] = Pr<sup>σ</sup><sup>k</sup> <sup>s</sup> (♦≤<sup>k</sup>G) and yk[s] = Pr<sup>σ</sup><sup>k</sup> <sup>s</sup> (≤<sup>k</sup>S?).

It remains to derive an upper bound uk. To ensure that Lemma 3 holds we require (i) <sup>u</sup><sup>k</sup> <sup>≥</sup> max<sup>s</sup>∈S? Prmax <sup>s</sup> (♦G) and (ii) u<sup>k</sup> ∈ Uk, where

$$U\_k = \{ u \in \mathbb{R} \mid \sigma\_k \in \operatorname\*{arg\,max}\_{\sigma \in \mathfrak{S}^{\mathcal{M}}} \text{Pr}\_s^\sigma(\diamond^{\leq k} G) + \text{Pr}\_s^\sigma(\square^{\leq k} S\_{\heartsuit}) \cdot u \text{ for all } s \in S\_{\heartsuit} \}.$$

Intuitively, the set <sup>U</sup><sup>k</sup> <sup>⊆</sup> <sup>R</sup> consists of all possible upper bounds <sup>u</sup> for which σ<sup>k</sup> is still optimal. U<sup>k</sup> ⊆ is convex as it can be represented as a conjunction of inequalities with <sup>U</sup><sup>0</sup> <sup>=</sup> <sup>R</sup> and <sup>u</sup> <sup>∈</sup> <sup>U</sup><sup>k</sup> if and only if <sup>u</sup> <sup>∈</sup> <sup>U</sup><sup>k</sup>−<sup>1</sup> and for all <sup>s</sup> <sup>∈</sup> <sup>S</sup>? with σk(s) = α and for all β ∈ *Act*(s) \ {α}

$$\sum\_{s' \in S} \mathbf{P}(s, \alpha, s') \cdot (x\_{k-1}[s'] + y\_{k-1}[s'] \cdot u) \ge \sum\_{s' \in S} \mathbf{P}(s, \beta, s') \cdot (x\_{k-1}[s'] + y\_{k-1}[s'] \cdot u).$$

The algorithm maintains the so-called *decision value* d<sup>k</sup> which corresponds to the minimum of U<sup>k</sup> (or −∞ if the minimum does not exist). Algorithm 4 outlines the

```
1 function findAction(x, y, s, u)
2 if u = ∞ then
3 return α ∈ arg maxα∈Act(s)

                                       s-
                                         ∈S P(s, α, s-

                                                    ) · (x[s-

                                                           ] + y[s-

                                                                  ] · u)
4 else
5 return α ∈ arg maxα∈Act(s)

                                       s-
                                         ∈S P(s, α, s-

                                                    ) · (y[s-

                                                           ])
```
**Algorithm 3:** Computation of optimal action.

```
1 function decisionValue(x, y, s, α)
2 d ← −∞
3 foreach β ∈ Act(s) \ {α} do
4 yΔ ← 
                s-
                 ∈S(P(s, α, s-

                            ) − P(s, β, s-

                                       )) · y[s-

                                             ]
5 if yΔ > 0 then
6 xΔ ← 
                   s-
                    ∈S(P(s, β, s-

                               ) − P(s, α, s-

                                          )) · x[s-

                                                ]
7 d ← max(d, xΔ/yΔ)
8 return d
```
**Algorithm 4:** Computation of decision value.

procedure to obtain the decision value at a given state. Our algorithm ensures that u<sup>k</sup> is only set to a value in [dk, u<sup>k</sup>−<sup>1</sup>] ⊆ Uk.

**Lemma 4.** *After executing Line 18 of Algorithm 2:* <sup>u</sup><sup>k</sup> <sup>≥</sup> max<sup>s</sup>∈S? Prmax <sup>s</sup> (♦G)*.*

To show that <sup>u</sup><sup>k</sup> is a valid upper bound, let <sup>s</sup>max <sup>∈</sup> arg max<sup>s</sup>∈S?Prmax <sup>s</sup> (♦G) and u<sup>∗</sup> = Prmax <sup>s</sup>max (♦G). From Theorem 4, u<sup>k</sup>−<sup>1</sup> ≥ u∗, and u<sup>k</sup>−<sup>1</sup> ∈ U<sup>k</sup> we get

$$\begin{split} &u^\* \leq \max\_{\sigma \in \mathfrak{S}^{\mathcal{M}}} \Pr\_{s\_{\max}}^{\sigma} (\Diamond^{\leq k} G) + \Pr\_{s\_{\max}}^{\sigma} (\Box^{\leq k} S\_{?}) \cdot u\_{k-1} \\ &= \Pr\_{s\_{\max}}^{\sigma\_k} (\Diamond^{\leq k} G) + \Pr\_{s\_{\max}}^{\sigma\_k} (\Box^{\leq k} S\_{?}) \cdot u\_{k-1} = x\_k [s\_{\max}] + y\_k [s\_{\max}] \cdot u\_{k-1} \end{split}$$

which yields a new upper bound xk[smax] + yk[smax] · u<sup>k</sup>−<sup>1</sup> ≥ u∗. We repeat this scheme as follows. Let v<sup>0</sup> := u<sup>k</sup>−<sup>1</sup> and for i > 0 let v<sup>i</sup> := xk[smax]+yk[smax]·v<sup>i</sup>−<sup>1</sup>. We can show that v<sup>i</sup>−<sup>1</sup> ∈ U<sup>k</sup> implies v<sup>i</sup> ≥ u∗. Assuming yk[smax] < 1, the sequence <sup>v</sup>0, v1, v2,... converges to <sup>v</sup><sup>∞</sup> := lim<sup>i</sup>→∞ <sup>v</sup><sup>i</sup> <sup>=</sup> <sup>x</sup>k[smax] <sup>1</sup>−yk[smax] . We distinguish three cases to show that u<sup>k</sup> = min(u<sup>k</sup>−<sup>1</sup>, max(dk, max<sup>s</sup>∈S? xk[s] <sup>1</sup>−yk[s])) <sup>≥</sup> <sup>u</sup>∗.


*Example 7.* Reconsider the MDP M from Fig. 2(a) and goal states G = {s3, s4}. The maximal reachability probability is attained for a scheduler that always chooses β at state s0, which results in Prmax(♦G)=0.5. We now illustrate how Algorithm 2 approximates this value by sketching the first two iterations. For the first iteration *findAction* yields action α at s0. We obtain:

$$\begin{aligned} x\_1[s\_0] &= 0, \; x\_1[s\_1] = 0.1, \; x\_1[s\_2] = 0.1, \; y\_1[s\_0] = 0.8, \; y\_1[s\_1] = 0.9, \; y\_1[s\_2] = 0, \\\ d\_1 &= 0.3/(0.8 - 0.4) = 0.75, \; \ell\_1 = \min(0, 1, 0) = 0, \; u\_1 = \max(0.75, 0, 1, 0) = 1. \end{aligned}$$

In the second iteration *findAction* yields again α for s<sup>0</sup> and we get:

x2[s0]=0.08, x2[s1]=0.19, x2[s2]=0.1, y2[s0]=0.72, y2[s1]=0, y2[s2]=0, d<sup>2</sup> = 0.75, <sup>2</sup> = min(0.29, 0.19, 0.1) = 0.1, u<sup>2</sup> = max(0.75, 0.29, 0.19, 0.1) = 0.75.

Due to the decision value we do not set the upper bound u<sup>2</sup> to 0.29 < Prmax(♦G).

**Theorem 5.** *Algorithm 2 terminates for any MDP* M*, goal states* G *and precision* ε > <sup>0</sup>*. The returned value* <sup>r</sup> *satisfies* <sup>|</sup><sup>r</sup> <sup>−</sup> Prmax(♦G)| ≤ <sup>ε</sup>*.*

The correctness of the algorithm follows from Theorem 4 and Lemma 3. Termination follows since M is contracting with respect to S<sup>0</sup> ∪ G, implying lim<sup>k</sup>→∞ Pr<sup>σ</sup>(≤<sup>k</sup>S?) = 0. The optimizations for Algorithm <sup>1</sup> mentioned in Sect. 3.4 can be applied to Algorithm 2 as well.

**Fig. 3.** Comparison of sound value iteration (x-axis) and interval iteration (y-axis).

#### **5 Experimental Evaluation**

**Implementation.** We implemented sound value iteration for MCs and MDPs into the model checker Storm [8]. The implementation computes reachability probabilities and expected rewards using explicit data structures such as sparse matrices and vectors. Moreover, Multi-objective model checking is supported, where we straightforwardly extend the value iteration-based approach of [22] to sound value iteration. We also implemented the optimizations given in Sect. 3.4.

The implementation is available at www.stormchecker.org.

#### **Experimental Results.** We considered a wide range of case studies including


In total, 130 model and property instances were considered. For CTMCs and Markov automata we computed (untimed) reachability probabilities or expected rewards on the underlying MC and the underlying MDP, respectively. In all experiments the precision parameter was given by ε = 10−<sup>6</sup>.

We compare sound value iteration (SVI) with interval iteration (II) as presented in [12,13]. We consider the Gauss-Seidel variant of the approaches and compute initial bounds <sup>0</sup> and u<sup>0</sup> as in [12]. For a better comparison we consider the implementation of II in Storm. [21] gives a comparison with the implementation of II in PRISM. The experiments were run on a single core (2GHz) of an HP BL685C G7 with 192GB of available memory. However, almost all experiments required less than 4GB. We measured model checking times and required iterations. All logfiles and considered benchmarks are available at [25].

Figure 3(a) depicts the model checking times for SVI (x-axis) and II (y-axis). For better readability, the benchmarks are divided into four plots with different scales. Triangles () and circles (•) indicate MC and MDP benchmarks, respectively. Similarly, Fig. 3(b) shows the required iterations of the approaches. We observe that SVI converged faster and required fewer iterations for almost all MCs and MDPs. SVI performed particularly well on the challenging instances where many iterations are required. Similar observations were made when comparing the topological variants of SVI and II. Both approaches were still competitive if no a priori bounds are given to SVI. More details are given in [21].

Figure 4 indicates the model checking times of SVI and II as well as their topological variants. For reference, we also consider standard (unsound) value iteration (VI). The x-axis depicts the number of instances that have been solved by the corresponding approach within the time limit indicated on the y-axis. Hence, a point (x, y) means that for x instances the model checking time was less or equal than y. We observe that the topological variant of SVI yielded the best run times among all sound approaches and even competes with (unsound) VI.

**Fig. 4.** Runtime comparison between different approaches.

#### **6 Conclusion**

In this paper we presented a sound variant of the value iteration algorithm which safely approximates reachability probabilities and expected rewards in MCs and MDPs. Experiments on a large set of benchmarks indicate that our approach is a reasonable alternative to the recently proposed interval iteration algorithm.

#### **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

### **Safety-Aware Apprenticeship Learning**

Weichao Zhou and Wenchao Li(B)

Department of Electrical and Computer Engineering, Boston University, Boston, USA {zwc662,wenchao}@bu.edu

**Abstract.** Apprenticeship learning (AL) is a kind of Learning from Demonstration techniques where the reward function of a Markov Decision Process (MDP) is unknown to the learning agent and the agent has to derive a good policy by observing an expert's demonstrations. In this paper, we study the problem of how to make AL algorithms inherently safe while still meeting its learning objective. We consider a setting where the unknown reward function is assumed to be a linear combination of a set of state features, and the safety property is specified in Probabilistic Computation Tree Logic (PCTL). By embedding probabilistic model checking inside AL, we propose a novel *counterexample-guided* approach that can ensure safety while retaining performance of the learnt policy. We demonstrate the effectiveness of our approach on several challenging AL scenarios where safety is essential.

#### **1 Introduction**

The rapid progress of artificial intelligence (AI) comes with a growing concern over its safety when deployed in real-life systems and situations. As highlighted in [3], if the objective function of an AI agent is wrongly specified, then maximizing that objective function may lead to harmful results. In addition, the objective function or the training data may focus only on accomplishing a specific task and ignore other aspects, such as safety constraints, of the environment. In this paper, we propose a novel framework that combines explicit safety specification with learning from data. We consider safety specification expressed in Probabilistic Computation Tree Logic (PCTL) and show how probabilistic model checking can be used to ensure safety and retain performance of a learning algorithm known as *apprenticeship learning* (AL).

We consider the formulation of apprenticeship learning by Abbeel and Ng [1]. The concept of AL is closely related to *reinforcement learning* (RL) where an agent learns what actions to take in an environment (known as a policy) by maximizing some notion of long-term reward. In AL, however, the agent is not given the reward function, but instead has to first estimate it from a set of expert demonstrations via a technique called *inverse reinforcement learning* [18]. The formulation assumes that the reward function is expressible as a linear combination of *known state features*. An expert demonstrates the task by maximizing this reward function and the agent tries to derive a policy that can match the feature expectations of the expert's demonstrations. Apprenticeship learning can also be

c The Author(s) 2018

H. Chockler and G. Weissenbacher (Eds.): CAV 2018, LNCS 10981, pp. 662–680, 2018. https://doi.org/10.1007/978-3-319-96145-3\_38

viewed as an instance of the class of techniques known as Learning from Demonstration (LfD). One issue with LfD is that *the expert often can only demonstrate how the task works but not how the task may fail*. This is because failure may cause irrecoverable damages to the system such as crashing a vehicle. In general, the lack of "negative examples" can cause a heavy bias in how the learning agent constructs the reward estimate. In fact, *even if all the demonstrations are safe, the agent may still end up learning an unsafe policy*.

The key idea of this paper is to incorporate formal verification in apprenticeship learning. We are inspired by the line of work on formal inductive synthesis [10] and counterexample-guided inductive synthesis [22]. Our approach is also similar in spirit to the recent work on safety-constrained reinforcement learning [11]. However, our approach uses the results of model checking in a novel way. We consider safety specification expressed in probabilistic computation tree logic (PCTL). We employ a verification-in-the-loop approach by embedding PCTL model checking as a safety checking mechanism inside the learning phase of AL. In particular, when a learnt policy does not satisfy the PCTL formula, we leverage counterexamples generated by the model checker to steer the policy search in AL. In essence, counterexample generation can be viewed as supplementing negative examples for the learner. Thus, the learner will try to find a policy that not only imitates the expert's demonstrations but also stays away from the failure scenarios as captured by the counterexamples.

In summary, we make the following contributions in this paper.


The rest of the paper is organized as follows. Section 2 reviews background information on apprenticeship learning and PCTL model checking. Section 3 defines the safety-aware apprenticeship learning problem and gives an overview of our approach. Section 4 illustrates the counterexample-guided learning framework. Section 5 describes the proposed algorithm in detail. Section 6 presents a set of experimental results demonstrating the effectiveness of our approach. Section 7 discusses related work. Section 8 concludes and offers future directions.

#### **2 Preliminaries**

#### **2.1 Markov Decision Process and Discrete-Time Markov Chain**

Markov Decision Process (MDP) is a tuple M = (S, A, T, γ, s0, R), where S is a finite set of states; A is a set of actions; T : S × A × S → [0, 1] is a transition function describing the probability of transitioning from one state <sup>s</sup> <sup>∈</sup> <sup>S</sup> to another state by taking action <sup>a</sup> <sup>∈</sup> <sup>A</sup> in state <sup>s</sup>; <sup>R</sup> : <sup>S</sup> <sup>→</sup> <sup>R</sup> is a reward function which maps each state s ∈ S to a real number indicating the reward of being in state s; s<sup>0</sup> ∈ S is the initial state; γ ∈ [0, 1) is a discount factor which describes how future rewards attenuate when a sequence of transitions is made. A deterministic and stationary (or memoryless) policy π : S → A for an MDP M is a mapping from states to actions, i.e. the policy deterministically selects what action to take solely based on the current state. In this paper, we restrict ourselves to deterministic and stationary policy. A policy π for an MDP M induces a Discrete-Time Markov Chain (DTMC) M<sup>π</sup> = (S, Tπ, s0), where T<sup>π</sup> : S × S → [0, 1] is the probability of transitioning from a state s to another state in one step. A trajectory τ = s<sup>0</sup> <sup>T</sup>π(s0,s1)><sup>0</sup> −−−−−−−−→ <sup>s</sup><sup>1</sup> <sup>T</sup>π(s1,s2)><sup>0</sup> −−−−−−−−→ <sup>s</sup>2,..., is a sequence of states where s<sup>i</sup> ∈ S. The accumulated reward of τ is - ∞ i=0 γi R(si). The value function <sup>V</sup><sup>π</sup> : <sup>S</sup> <sup>→</sup> <sup>R</sup> measures the expectation of accumulated reward E[ - ∞ i=0 γi R(si)] starting from a state s and following policy π. An *optimal policy* π for MDP M is a policy that maximizes the value function [4].

#### **2.2 Apprenticeship Learning via Inverse Reinforcement Learning**

*Inverse reinforcement learning (IRL)* aims at recovering the reward function R of M\R = (S, A, T, γ, s0) from a set of m trajectories Γ<sup>E</sup> = {τ0, τ1,...,τ<sup>m</sup>−<sup>1</sup>} demonstrated by an expert. *Apprenticeship learning (AL)* [1] assumes that the reward function is a linear combination of state features, i.e. R(s) = ω<sup>T</sup> f(s) where <sup>f</sup> : <sup>S</sup> <sup>→</sup> [0, 1]<sup>k</sup> is a vector of known features over states <sup>S</sup> and <sup>ω</sup> <sup>∈</sup> <sup>R</sup><sup>k</sup> is an unknown weight vector that satisfies ||ω||<sup>2</sup> ≤ 1. The expected features of a policy π are the expected values of the cumulative discounted state features f(s) by following π on M, i.e. μ<sup>π</sup> = E[ -∞ <sup>t</sup>=0 γ<sup>t</sup> f(st)|π]. Let μ<sup>E</sup> denote the expected features of the unknown expert's policy πE. μ<sup>E</sup> can be approximated by the expected features of expert's m demonstrated trajectories ˆμ<sup>E</sup> = <sup>1</sup> m - τ∈Γ<sup>E</sup> - ∞ t=0 γt f(st) if m

is large enough. With a slight abuse of notations, we use μ<sup>Γ</sup> to also denote the expected features of a set of paths Γ. Given an error bound , a policy π<sup>∗</sup> is defined to be *-close* to π<sup>E</sup> if its expected features μ<sup>π</sup><sup>∗</sup> satisfies ||μ<sup>E</sup> −μ<sup>π</sup><sup>∗</sup> ||<sup>2</sup> ≤ . The expected features of a policy can be calculated by using Monte Carlo method, value iteration or linear programming [1,4].

The algorithm proposed by Abbeel and Ng [1] starts with a random policy π<sup>0</sup> and its expected features μ<sup>π</sup><sup>0</sup> . Assuming that in iteration i, a set of i candidate policies Π = {π0, π1,...,π<sup>i</sup>−<sup>1</sup>} and their corresponding expected features {μπ|π ∈ Π} have been found, the algorithm solves the following optimization problem.

$$\delta = \max\_{\omega} \min\_{\pi \in \Pi} \omega^T (\hat{\mu}\_E - \mu\_\pi) \qquad s.t. \ ||\omega||\_2 \le 1 \tag{1}$$

The optimal ω is used to find the corresponding optimal policy π<sup>i</sup> and the expected features μ<sup>π</sup><sup>i</sup> . If δ ≤ , then the algorithm terminates and π<sup>i</sup> is produced as the output. Otherwise, μπ<sup>i</sup> is added to the set of features for the candidate policy set Π and the algorithm continues to the next iteration.

#### **2.3 PCTL Model Checking**

Probabilistic model checking can be used to verify properties of a stochastic system such as "is the probability that the agent reaches the unsafe area within 10 steps smaller than 5%?". *Probabilistic Computation Tree Logic* (PCTL) [7] allows for probabilistic quantification of properties. The syntax of PCTL includes state formulas and path formulas [13]. A state formula φ asserts property of a single state s ∈ S whereas a path formula ψ asserts property of a trajectory.

$$\phi ::= \text{true} \mid l\_i \mid \neg \phi\_i \mid \phi\_i \land \phi\_j \mid P\_{\mathsf{M}p^\*} [\psi] \tag{2}$$

$$
\psi ::= \mathbf{X}\phi \mid \phi\_1 \mathbf{U}^{\leq k} \phi\_2 \mid \phi\_1 \mathbf{U} \phi\_2 \tag{3}
$$

where l<sup>i</sup> is atomic proposition and φi, φ<sup>j</sup> are state formulas; - ∈ {≤, <sup>≥</sup>, <, >}; P<sup>p</sup><sup>∗</sup> [ψ] means that the probability of generating a trajectory that satisfies formula ψ is p∗. **X**φ asserts that the next state after initial state in the trajectory satisfies φ; φ<sup>1</sup> **U**≤<sup>k</sup> φ<sup>2</sup> asserts that φ<sup>2</sup> is satisfied in at most k transitions and all preceding states satisfy φ1; φ<sup>1</sup> **U** φ<sup>2</sup> asserts that φ<sup>2</sup> will be eventually satisfied and all preceding states satisfy φ1. The semantics of PCTL is defined by a satisfaction relation |= as follows.

$$s \mid = \text{true} \quad \text{iff state } s \in S \tag{4}$$

$$s \mid \quad \phi \quad \text{iff state s satisfies the state formula } \phi \tag{5}$$

$$
\tau \vdash \quad \psi \quad \text{iff trajectory } \tau \text{ satisfies the path formula } \psi. \tag{6}
$$

Additionally, |=min denotes the minimal satisfaction relation [6] between τ and ψ. Defining pref(τ ) as the set of all prefixes of trajectory τ including τ itself, then <sup>τ</sup> <sup>|</sup>=min <sup>ψ</sup> iff (<sup>τ</sup> <sup>|</sup><sup>=</sup> <sup>ψ</sup>) <sup>∧</sup> (∀<sup>τ</sup> <sup>∈</sup> pref(<sup>τ</sup> )\τ,τ <sup>ψ</sup>). For instance, if <sup>ψ</sup> <sup>=</sup> <sup>φ</sup><sup>1</sup> **<sup>U</sup>**≤<sup>k</sup> <sup>φ</sup>2, then for any finite trajectory <sup>τ</sup> <sup>|</sup>=min <sup>φ</sup>1**U**≤<sup>k</sup>φ2, only the final state in τ satisfies φ2. Let P(τ ) be the probability of transitioning along a trajectory τ and let Γ<sup>ψ</sup> be the set of all finite trajectories that satisfy τ |=min ψ, the value of PCTL property ψ is defined as P=?|s<sup>0</sup> [ψ] = - τ∈Γ<sup>ψ</sup> P(τ ). For a DTMC

M<sup>π</sup> and a state formula φ = P≤p<sup>∗</sup> [ψ], M<sup>π</sup> |= φ iff P=?|s<sup>0</sup> [ψ] ≤ p∗. 

A *counterexample* of φ is a set cex ⊆ Γ<sup>ψ</sup> that satisfies τ∈cex P(τ ) > p∗. Let P(Γ) = - τ∈Γ P(τ ) be the sum of probabilities of all trajectories in a set Γ. Let CEX<sup>φ</sup> <sup>⊆</sup> <sup>2</sup><sup>Γ</sup><sup>ψ</sup> be the set of all counterexamples for a formula <sup>φ</sup> such that (∀cex <sup>∈</sup> CEXφ, <sup>P</sup>(cex) > p∗) and (∀<sup>Γ</sup> <sup>∈</sup> <sup>2</sup><sup>Γ</sup><sup>ψ</sup> \CEXφ, <sup>P</sup>(Γ) <sup>≤</sup> <sup>p</sup>∗). A *minimal counterexample* is a set cex ∈ CEX<sup>φ</sup> such that ∀cex ∈ CEXφ, |cex|≤|cex |. By converting DTMC M<sup>π</sup> into a weighted directed graph, counterexample can be found by solving a k-shortest paths (KSP) problem or a hop-constrained

KSP (HKSP) problem [6]. Alternatively, counterexamples can be found by using Satisfiability Modulo Theory solving or mixed integer linear programming to determine the minimal critical subsystems that capture the counterexamples in M<sup>π</sup> [23].

A policy can also be synthesized by solving the objective min<sup>π</sup> <sup>P</sup>=?[ψ] for an MDP M. This problem can be solved by linear programming or policy iteration (and value iteration for step-bounded reachability) [14].

#### **3 Problem Formulation and Overview**

Suppose there are some unsafe states in an MDP\RM = (S, A, T, γ, s0). A safety issue in apprenticeship learning means that an agent following the learnt policy would have a higher probability of entering those unsafe states than it should. There are multiple reasons that can give rise to this issue. First, it is possible that the expert policy π<sup>E</sup> itself has a high probability of reaching the unsafe states. Second, human experts often tend to perform only successful demonstrations that do not highlight the unwanted situations [21]. This *lack of negative examples* in the training set can cause the learning agent to be unaware of the existence of those unsafe states.

**Fig. 1.** The 8 × 8 grid-world. (a) Lighter grid cells have higher rewards than the darker ones. The two black grid cells have the lowest rewards, while the two white ones have the highest rewards. The grid cells enclosed by red lines are considered *unsafe*. (b) The blue line is an example trajectory demonstrated by the expert. (c) Only the goal states are assigned high rewards and there is little difference between the unsafe states and their nearby states. As a result, the learnt policy has a high probability of passing through the unsafe states as indicated by the cyan line. (d) p<sup>∗</sup> = 20%. The learnt policy is optimal to a reward function that correctly assigns low rewards to the unsafe states. (Color figure online)

We use a 8 × 8 grid-world navigation example as shown in Fig. 1 to illustrate this problem. An agent starts from the upper-left corner and moves from cell to cell until it reaches the lower-right corner. The 'unsafe' cells are enclosed by the red lines. These represent regions that the agent should avoid. In each step, the agent can choose to stay in current cell or move to an adjacent cell but with 20% chance of moving randomly instead of following its decision. The goal area, the unsafe area and the reward mapping for all states are shown in Fig. 1(a). For each state s ∈ S, its feature vector consists of 4 radial basis feature functions with respect to the squared Euclidean distances between s and the 4 states with the highest or lowest rewards as shown in Fig. 1(a). In addition, a specification Φ formalized in PCTL is used to capture the safety requirement. In (7), p<sup>∗</sup> is the required upper bound of the probability of reaching an unsafe state within t = 64 steps.

$$\Phi ::= P\_{\leq p^\*} [\texttt{true } \mathbf{U}^{\leq t} \text{ шызабе}] \tag{7}$$

Let π<sup>E</sup> be the optimal policy under the reward map shown in Fig. 1(a). The probability of entering an unsafe region within 64 steps by following π<sup>E</sup> is 24.6%. Now consider the scenario where the expert performs a number of demonstrations by following πE. *All demonstrated trajectories in this case successfully reach the goal areas without ever passing through any of the unsafe regions.* Figure 1(b) shows a representative trajectory (in blue) among 10, 000 such demonstrated trajectories. The resulting reward map by running the AL algorithm on these 10,000 demonstrations is shown in Fig. 1(c). Observe that only the goal area has been learnt whereas the agent is oblivious to the unsafe regions (treating them in the same way as other dark cells). In fact, the probability of reaching an unsafe state within 64 steps with this policy turns out to be 82.6% (thus violating the safety requirement by a large margin). To make matters worse, the value of p<sup>∗</sup> may be decided or revised after a policy has been learnt. In those cases, even the original expert policy π<sup>E</sup> may be unsafe, e.g., when p<sup>∗</sup> = 20%. Thus, we need to adapt the original AL algorithm so that it will take into account of such safety requirement. Figure 1(d) shows the resulting reward map learned using our proposed algorithm (to be described in detail later) for p<sup>∗</sup> = 20%. It clearly matches well with the color differentiation in the original reward map and captures both the goal states and the unsafe regions. This policy has an unsafe probability of 19.0%. We are now ready to state our problem.

**Definition 1.** *The safety-aware apprenticeship learning (SafeAL) problem is, given an* MDP\R*, a set of* m *trajectories* {τ0, τ1,...,τ<sup>m</sup>−<sup>1</sup>} *demonstrated by an expert, and a specification* Φ*, to learn a policy* π *that satisfies* Φ *and is -close to the expert policy* πE*.*

*Remark 1.* We note that a solution may not always exist for the SafeAL problem. While the decision problem of checking whether a solution exists is of theoretical interest, in this paper, we focus on tackling the problem of finding a policy π that satisfies a PCTL formula Φ (if Φ is satisfiable) and whose performance is as close to that of the expert's as possible, i.e. we relax the condition on μ<sup>π</sup> being -close to μE.

#### **4 A Framework for Safety-Aware Learning**

In this section, we describe a general framework for safety-aware learning. This novel framework utilizes information from both the expert demonstrations and a verifier. The proposed framework is illustrated in Fig. 2. Similar to the *counterexample-guided inductive synthesis* (CEGIS) paradigm [22], our framework consists of a *verifier* and a *learner*. The verifier checks if a candidate policy satisfies the safety specification Φ. In case Φ is not satisfied, the verifier generates a counterexample for Φ. The main difference from CEGIS is that our framework considers not only functional correctness, e.g., safety, but also performance (as captured by the learning objective). Starting from an initial policy π0, each time the learner learns a new policy, the verifier checks if the specification is satisfied. If true, then this policy is added to the candidate set, otherwise the verifier will generate a (minimal) counterexample and add it to the counterexample set. During the learning phase, the learner uses both the counterexample set and candidate set to find a policy that is close to the (unknown) expert policy and far away from the counterexamples. The goal is to find a policy that is -close to the expert policy and satisfies the specification. For the grid-world example introduced in Sect. 3, when p<sup>∗</sup> = 5% (thus presenting a stricter safety requirement compared to the expert policy πE), our approach produces a policy with only 4.2% of reaching an unsafe state within 64 steps (with the correspondingly inferred reward mapping shown in Fig. 1(d)).

**Fig. 2.** Our safety-aware learning framework. Given an initial policy π0, a specification Φ and a learning objective (as captured by ), the framework iterates between a *verifier* and a *learner* to search for a policy π<sup>∗</sup> that satisfies both Φ and . One invariant that this framework maintains is that all the πi's in the candidate policy set satisfy Φ.

Learning from a (minimal) counterexample cex<sup>π</sup> of a policy π is similar to learning from expert demonstrations. The basic principle of the AL algorithm proposed in [1] is to find a weight vector ω under which the expected reward of π<sup>E</sup> maximally outperforms any mixture of the policies in the candidate policy set Π = {π0, π1, π2,...}. Thus, ω can be viewed as the normal vector of the hyperplane <sup>ω</sup><sup>T</sup> (μ−μE) = 0 that has the maximal distance to the convex hull of the set {μ<sup>π</sup> | π ∈ Π} as illustrated in the 2D feature space in Fig. 3(a). It can be shown

**Fig. 3.** (a) Learn from expert. (b) Learn from both expert demonstrations and counterexamples.

that <sup>ω</sup><sup>T</sup> <sup>μ</sup><sup>π</sup> <sup>≥</sup> <sup>ω</sup><sup>T</sup> <sup>μ</sup><sup>π</sup> for all previously found <sup>π</sup> s. Intuitively, this helps to move the candidate μ<sup>π</sup> closer to μE. Similarly, we can apply the same max-margin separation principle to maximize the distance between the candidate policies and the counterexamples (in the μ space). Let CEX = {cex0, cex1, cex2, ...} denote the set of counterexamples of the policies that do not satisfy the specification Φ. Maximizing the distance between the convex hulls of the sets {μcex|cex ∈ CEX} and {μ<sup>π</sup> | π ∈ Π} is equivalent to maximizing the distance between the parallel supporting hyperplanes of the two convex hulls as shown in Fig. 3(b). The corresponding optimization function is given in Eq. (8).

$$\delta = \max\_{\omega} \min\_{\pi \in \Pi, \epsilon ex \in CEX} \omega^T (\mu\_{\pi} - \mu\_{\epsilon ex}) \qquad s.t. \ ||\omega||\_2 \le 1 \tag{8}$$

To attain good performance similar to that of the expert, we still want to learn from μE. Thus, the overall problem can be formulated as a multi-objective optimization problem that combines (1) and (8) into (9).

$$\max\_{\omega} \min\_{\pi \in \Pi, \tilde{\pi} \in \Pi, cex \in CEX} \left( \omega^T (\mu\_E - \mu\_\pi), \ \omega^T (\mu\_{\tilde{\pi}} - \mu\_{cex}) \right) \qquad s.t. \ ||\omega||\_2 \le 1 \tag{9}$$

#### **5 Counterexample-Guided Apprenticeship Learning**

In this section, we introduce the CounterExample Guided Apprenticeship Learning (CEGAL) algorithm to solve the SafeAL problem. It can be viewed as a special case of the safety-aware learning framework described in the previous section. In addition to combining policy verification, counterexample generation and AL, our approach uses an adaptive weighting scheme to weight the separation from μ<sup>E</sup> with the separation from μcex.

$$\begin{cases} \max\_{\omega} \min\_{\pi \in \Pi\_S, \tilde{\pi} \in \Pi\_S, \operatorname{exex} \in CEX} \omega^T (k(\mu\_E - \mu\_\pi) + (1 - k)(\mu\_\tilde{\pi} - \mu\_{\text{ex}})) & (10) \\ s.t. \ ||\omega||\_2 \le 1, \ k \in [0, 1] \\ \omega^T(\mu\_E - \mu\_\pi) \le \omega^T(\mu\_E - \mu\_{\pi'}), \ \forall \pi' \in \Pi\_S \\ \omega^T(\mu\_\tilde{\pi} - \mu\_{\text{ex}}) \le \omega^T(\mu\_{\tilde{\pi}'} - \mu\_{\text{ex}}, \epsilon), \ \forall \tilde{\pi}' \in \Pi\_S, \forall \epsilon \text{ex}' \in CEX \end{cases} (10)$$

In essence, we take a weighted-sum approach for solving the multi-objective optimization problem (9). Assuming that Π<sup>S</sup> = {π1, π2, π3,...} is a set of candidate policies that all satisfy Φ, CEX = {cex1, cex2, cex3,...} is a set of counterexamples. We introduce a parameter k and change (9) into a weighted sum optimization problem (10). Note that π and ˜π can be different. The optimal ω solved from (10) can be used to generate a new policy π<sup>ω</sup> by using algorithms such as policy iteration. We use a probabilistic model checker, such as PRISM [13], to check if π<sup>ω</sup> satisfies Φ. If it does, then it will be added to ΠS. Otherwise, a counterexample generator, such as COMICS [9], is used to generate a (minimal) counterexample cex<sup>π</sup><sup>ω</sup> , which will be added to CEX.

**Algorithm 1.** Counterexample-Guided Apprenticeship Learning (CEGAL)

1: **Input**: 2: M ← A partially known MDP\R; f ← A vector of feature functions 3: μ<sup>E</sup> ← The expected features of expert trajectories {τ0, τ1,...,τm} 4: Φ ← Specification; ← Error bound for the expected features; 5: σ, α ∈ (0, 1) ← Error bound σ and step length α for the parameter k; 6: **Initialization**: 7: **If** ||μ<sup>E</sup> − μ<sup>π</sup><sup>0</sup> ||<sup>2</sup> ≤ , **then return** π<sup>0</sup> π<sup>0</sup> is the **initial safe policy** 8: Π<sup>S</sup> ← {π0}, CEX ← ∅ Initialize candidate and counterexample set 9: inf ← 0, sup ← 1, k ← sup Initialize multi-optimization parameter k 10: π<sup>1</sup> ← Policy learnt from μ<sup>E</sup> via apprenticeship learning 11: **Iteration** i (i ≥ 1): 12: **Verifier:** 13: status ← Model Checker(M, πi, Φ) 14: **If** status = SAT, **then go to Learner** 15: **If** status = UNSAT 16: cex<sup>π</sup><sup>i</sup> ← Counterexample Generator(M, πi, Φ) 17: Add cex<sup>π</sup><sup>i</sup> to CEX and solve μcexπi , **go to Learner** 18: **Learner:** 19: **If** status = SAT 20: **If** ||μ<sup>E</sup> − μ<sup>π</sup><sup>i</sup> ||<sup>2</sup> ≤ , **then return** π<sup>∗</sup> ← π<sup>i</sup> 21: Terminate. π<sup>i</sup> is -close to π<sup>E</sup> 22: Add π<sup>i</sup> to ΠS, inf ← k, k ← sup Update ΠS, inf and reset k 23: **If** status = UNSAT 24: **If** |k − inf| ≤ σ, **then return** π<sup>∗</sup> ← argmin π∈Π<sup>S</sup> ||μ<sup>E</sup> − μπ||<sup>2</sup> 25: Terminate. k is too close to its lower bound. 26: k ← α · inf + (1 − α)k Decrease k to learn for safety 27: <sup>ω</sup>i+1 <sup>←</sup> argmax <sup>ω</sup> min <sup>π</sup>∈Π<sup>S</sup> ,π˜∈Π<sup>S</sup> ,cex∈CEX <sup>ω</sup><sup>T</sup> (k(μ<sup>E</sup> <sup>−</sup> <sup>μ</sup>π) + (1 <sup>−</sup> <sup>k</sup>)(μπ˜ <sup>−</sup> <sup>μ</sup>cex)) 28: Note that the multi-objective optimization function recovers AL when k = 1 29: πi+1, μ<sup>π</sup>i+1 ← Compute the optimal policy πi+1 and its expected features μ<sup>π</sup>i+1 for the MDP M with reward R(s) = ω<sup>T</sup> <sup>i</sup>+1f(s) 30: **Go to next iteration**

Algorithm 1 describes CEGAL in detail. With a constant sup = 1 and a variable inf ∈ [0, sup] for the upper and lower bounds respectively, the learner determines the value of k within [inf, sup] in each iteration depending on the outcome of the verifier and uses k in solving (10) in line 27. Like most nonlinear optimization algorithms, this algorithm requires an initial guess, which is an initial safe policy π<sup>0</sup> to make Π<sup>S</sup> nonempty. A good initial candidate would be the maximally safe policy for example obtained using PRISM-games [15]. Without loss of generality, we assume this policy satisfies Φ. Suppose in iteration i, an intermediate policy π<sup>i</sup> learnt by the learner in iteration i − 1 is verified to satisfy Φ, then we increase inf to inf = k and reset k to k = sup as shown in line 22. If π<sup>i</sup> does not satisfy Φ, then we reduce k to k = α · inf + (1 − α)k as shown in line 26 where α ∈ (0, 1) is a step length parameter. If |k − inf| ≤ σ and π<sup>i</sup> still does not satisfy Φ, the algorithm chooses from Π<sup>S</sup> a best safe policy π<sup>∗</sup> which has the smallest margin to π<sup>E</sup> as shown in line 24. If π<sup>i</sup> satisfies Φ and is -close to πE, the algorithm outputs π<sup>i</sup> as show in line 19. For the occasions when π<sup>i</sup> satisfies Φ and inf = sup = k = 1, solving (10) is equivalent to solving (1) as in the original AL algorithm.

*Remark 2.* The initial policy π<sup>0</sup> does not have to be maximally safe, although such a policy can be used to verify if Φ is satisfiable at all. Naively safe policies often suffice for obtaining a safe and performant output at the end. Such a policy can be obtained easily in many settings, e.g., in the grid-world example one safe policy is simply staying in the initial cell. In both cases, π<sup>0</sup> typically has very low performance since satisfying Φ is the only requirement for it.

**Theorem 1.** *Given an initial policy* π<sup>0</sup> *that satisfies* Φ*, Algorithm 1 is guaranteed to output a policy* π∗*, such that (1)* π<sup>∗</sup> *satisfies* Φ*, and (2) the performance of* π<sup>∗</sup> *is at least as good as that of* π<sup>0</sup> *when compared to* πE*, i.e.* μ<sup>E</sup> − μ<sup>π</sup><sup>∗</sup> <sup>2</sup> ≤ μ<sup>E</sup> − μ<sup>π</sup><sup>0</sup> <sup>2</sup>*.*

*Proof Sketch.* The first part of the guarantee can be proven by case splitting. Algorithm 1 outputs π<sup>∗</sup> either when π<sup>∗</sup> satisfies Φ and is -close to πE, or when |k − inf| ≤ σ in some iteration. In the first case, π<sup>∗</sup> clearly satisfies Φ. In the second case, π<sup>∗</sup> is selected from the set Π<sup>S</sup> which contains all the policies that have been found to satisfy Φ so far, so π<sup>∗</sup> satisfies Φ. For the second part of the guarantee, the initial policy π<sup>0</sup> is the final output π<sup>∗</sup> if π<sup>0</sup> satisfies Φ and is close to πE. Otherwise, π<sup>0</sup> is added to Π<sup>S</sup> if it satisfies Φ. During the iteration, if |k−inf| ≤ σ in some iteration, then the final output is π<sup>∗</sup> = argmin π∈Π<sup>S</sup> ||μ<sup>E</sup> −μπ||2, so it must satisfy μ<sup>E</sup> −μ<sup>π</sup><sup>∗</sup> <sup>2</sup> ≤ μ<sup>E</sup> −μ<sup>π</sup><sup>0</sup> <sup>2</sup>. If a learnt policy π<sup>∗</sup> satisfies Φ and is -close to πE, then Algorithm 1 outputs π<sup>∗</sup> without adding it to ΠS. Obviously μ<sup>E</sup> − μ<sup>π</sup> <sup>2</sup> > , ∀π ∈ ΠS, so μ<sup>E</sup> − μ<sup>π</sup><sup>∗</sup> <sup>2</sup> ≤ μ<sup>E</sup> − μ<sup>π</sup><sup>0</sup> <sup>2</sup>.

*Discussion.* In the worst case, CEGAL will return the initial safe policy. However, this can be because a policy that simultaneously satisfies Φ and is -close to the expert's demonstrations does not exist. Comparing to AL which offers no safety guarantee and finding the maximally safe policy which has very poor performance, CEGAL provides a principled way of guaranteeing safety while retaining performance.

*Convergence.* Algorithm 1 is guaranteed to terminate. Let inf<sup>t</sup> be the t th assigned value of inf. After inf<sup>t</sup> is given, k is decreased from k<sup>0</sup> = sup iteratively by k<sup>i</sup> = α · inf<sup>t</sup> + (1 − α)ki−<sup>1</sup> until either |k<sup>i</sup> − inft| ≤ σ in line 24 or a new safe policy is found in line 18. The update of k satisfies the following equality.

$$\frac{|k\_{i+1} - \imath inf\_t|}{|k\_i - \imath inf\_t|} = \frac{\alpha \cdot \imath inf\_t + (1 - \alpha)k\_i - \imath inf\_t}{k\_i - \imath inf\_t} = 1 - \alpha \tag{11}$$

Thus, it takes no more than 1 + log<sup>1</sup>−<sup>α</sup> <sup>σ</sup> sup−inf<sup>t</sup> iterations for either the algorithm to terminate in line 24 or a new safe policy to be found in line 18. If a new safe policy is found in line 18, inf will be assigned in line 22 by the current value of k as inft+1 = k which obviously satisfies inft+1 −inf<sup>t</sup> ≥ (1−α)σ. After the assignment of inft+1, the iterative update of k resumes. Since sup−inf<sup>t</sup> ≤ 1, the following inequality holds.

$$\frac{|\inf\_{t+1} - \sup|}{|\inf\_t - \sup|} \le \frac{\sup - \inf\_t - (1 - \alpha)\sigma}{\sup - \inf\_t} \le 1 - (1 - \alpha)\sigma \tag{12}$$

Obviously, starting from an initial inf = inf<sup>0</sup> < sup, with the alternating update of inf and k, inf will keep getting close to sup unless the algorithm terminates as in line 24 or a safe policy -close to π<sup>E</sup> is found as in line 19. The extreme case is that finally inf = sup after no more than sup−inf<sup>0</sup> (1−α)<sup>σ</sup> updates on inf. Then, the problem becomes AL. Therefore, the worst case of this algorithm can have two phases. In the first phase, inf increases from inf = 0 to inf = sup. Between each two consecutive updates (t, t + 1) on inf, there are no more than log<sup>1</sup>−<sup>α</sup> (1−α)σ sup−inf<sup>t</sup> updates on <sup>k</sup> before inf is increased from inf<sup>t</sup> to inft+1. Overall, this phase takes no more than

$$\sum\_{0 \le i < \frac{\sup - \inf\_{\Omega}}{(1 - \alpha)\sigma}} \log\_{1 - \alpha} \frac{(1 - \alpha)\sigma}{\sup - \inf\_{\Omega} - i \cdot (1 - \alpha)\sigma} = \sum\_{0 \le i < \frac{1}{(1 - \alpha)\sigma}} \log\_{1 - \alpha} \frac{(1 - \alpha)\sigma}{1 - i \cdot (1 - \alpha)\sigma} \tag{13}$$

iterations to reduce the multi-objective optimization problem to original apprenticeship learning and then the second phase begins. Since k = sup, the iteration will stop immediately when an unsafe policy is learnt as in line 24. This phase will not take more iterations than original AL algorithm does to converge and the convergence result of AL is given in [1].

In each iteration, the algorithm first solves a second-order cone programming (SOCP) problem (10) to learn a policy. SOCP problems can be solved in polynomial time by interior-point (IP) methods [12]. PCTL model checking for DTMCs can be solved in time linear in the size of the formula and polynomial in the size of the state space [7]. Counterexample generation can be done either by enumerating paths using the k-shortest path algorithm or determining a critical subsystem using either a SMT formulation or mixed integer linear programming (MILP) [23]. For the k-shortest path-based algorithm, it can be computationally expensive sometimes to enumerate a large amount of paths (i.e. a large k) when p<sup>∗</sup> is large. This can be alleviated by using a smaller p<sup>∗</sup> during calculation, which is equivalent to considering only paths that have high probabilities.

#### **6 Experiments**

We evaluate our algorithm on three case studies: (1) grid-world, (2) cart-pole, and (3) mountain-car. The cart-pole environment<sup>1</sup> and the mountain-car environment<sup>2</sup> are obtained from OpenAI Gym. All experiments are carried out on a quad-core i7-7700K processor running at 3.6 GHz with 16 GB of memory. Our prototype tool was implemented in Python<sup>3</sup>. The parameters are γ = 0.99, = 10, σ = 10−<sup>5</sup>, α = 0.5 and the maximum number of iterations is 50. For the OpenAI-gym experiments, in each step, the agent sends an action to the OpenAI environment and the environment returns an observation and a reward (0 or 1). We show that our algorithm can guarantee safety while retaining the performance of the learnt policy compared with using AL alone.

#### **6.1 Grid World**

We first evaluate the scalability of our tool using the grid-world example. Table 1 shows the average runtime (per iteration) for the individual components of our tool as the size of the grid-world increases. The first and second columns indicate the size of the grid world and the resulting state space. The third column shows the average runtime that policy iteration takes to compute an optimal policy π for a known reward function. The forth column indicates the average runtime that policy iteration takes to compute the expected features μ for a known policy. The fifth column indicates the average runtime of verifying the PCTL formula using PRISM. The last column indicates the average runtime that generating a counterexample using COMICS.


**Table 1.** Average runtime per iteration in seconds.

#### **6.2 Cart-Pole from OpenAI Gym**

In the cart-pole environment as shown in Fig. 4(a), the goal is to keep the pole on a cart from falling over as long as possible by moving the cart either to the left or to the right in each time step. The maximum step length is t = 200. The

<sup>1</sup> https://github.com/openai/gym/wiki/CartPole-v0.

<sup>2</sup> https://github.com/openai/gym/wiki/MountainCar-v0.

<sup>3</sup> https://github.com/zwc662/CAV2018.

position, velocity and angle of the cart and the pole are continuous values and observable, but the actual dynamics of the system are unknown<sup>4</sup>.

**Fig. 4.** (a) The cart-pole environment. (b) The cart is at −0.3 and pole angle is −20◦. (c) The cart is at 0.3 and pole angle is 20◦.

A maneuver is deemed *unsafe* if the pole angle is larger than ±20◦ while the cart's horizontal position is more than ±0.3 as shown in Fig. 4(b) and (c). We formalize the safety requirement in PCTL as (14).

$$\Phi ::= P\_{\leq p^\*} \left[ true \text{ } \mathbf{U}^{\leq t} \text{ } (angle \leq -20^\circ \land position \leq -0.3) \right]$$

$$\forall (angle \geq 20^\circ \land position \geq 0.3) \text{ } \tag{14}$$


**Table 2.** In the cart-pole environment, *higher* average steps mean better performance. The safest policy is synthesized using PRISM-games.

We used 2000 demonstrations for which the pole is held upright without violating any of the safety conditions for all 200 steps in each demonstration. The safest policy synthesized by PRISM-games is used as the initial safe policy. We also compare the different policies learned by CEGAL for different safety threshold p∗s. In Table 2, the policies are compared in terms of model checking results

<sup>4</sup> The MDP is built from sampled data. The feature vector in each state contains 30 radial basis functions which depend on the squared Euclidean distances between current state and other 30 states which are uniformly distributed in the state space.

('MC Result') on the PCTL property in (14) using the constructed MDP, the average steps ('Avg. Steps') that a policy (executed in the OpenAI environment) can hold across 5000 rounds (the higher the better), and the number of iterations ('Num. of Iters') it takes for the algorithm to terminate (either converge to an -close policy, or terminate due to σ, or terminate after 50 iterations). The policy in the first row is the result of using AL alone, which has the best performance but also a 49.1% probability of violating the safety requirement. The safest policy as shown in the second row is always safe has almost no performance at all. This policy simply letts the pole fall and thus does not risk moving the cart out of the range [−0.3, 0.3]. On the other hand, it is clear that the policies learnt using CEGAL always satisfy the safety requirement. From p<sup>∗</sup> = 30% to 10%, the performance of the learnt policy is comparable to that of the AL policy. However, when the safety threshold becomes very low, e.g., p<sup>∗</sup> = 5%, the performance of the learnt policy drops significantly. This reflects the phenomenon that the tighter the safety condition is the less room for the agent to maneuver to achieve a good performance.

#### **6.3 Mountain-Car from OpenAI Gym**

Our third experiment uses the mountain-car environment from OpenAI Gym. As shown in Fig. 5(a), a car starts from the bottom of the valley and tries to reach the mountaintop on the right as quickly as possible. In each time step the car can perform one of the three actions, accelerating to the left, coasting, and accelerating to the right. The agent fails if the step length reaches the maximum (t = 66). The velocity and position of the car are continuous values and observable while the exact dynamics are unknown<sup>5</sup>. In this game setting, the car cannot reach the right mountaintop by simply accelerating to the right. It has to accumulate momentum first by moving back and forth in the valley. The safety rules we enforce are shown in Fig. 5(b). They correspond to speed limits when the car is close to the left mountaintop or to the right mountaintop (in case it is a cliff on the other side of the mountaintop). Similar to the previous experiments, we considered 2000 expert demonstrations for which all of them successfully reach the right mountaintop without violating any of the safety conditions. The average number of steps for the expert to drive the car to the right mountaintop is 40. We formalize the safety requirement in PCTL as (15).

$$\Phi ::= P\_{\leq p\*} \left[ true \text{ } \mathbf{U}^{\leq t} \text{ } (speed \leq -0.04 \land position \leq -1.1) \text{ } \right.\tag{15}$$

$$\lor (speed \geq 0.04 \land position \geq 0.5) \text{ } \right.\tag{15}$$

We compare the different policies using the same set of categories as in the cart-pole example. The numbers are averaged over 5000 runs. As shown in the

<sup>5</sup> The MDP is built from sampled data. The feature vector for each state contains 2 exponential functions and 18 radial basis functions which respectively depend on the squared Euclidean distances between the current state and other 18 states which are uniformly distributed in the state space.

**Fig. 5.** (a) The original mountain-car environment. (b) The mountain-car environment with traffic rules: when the distance from the car to the left edge or the right edge is shorter than 0.1, the speed of the car should be lower than 0.04.

first row, the policy learnt via AL<sup>6</sup> has the highest probability of going over the speed limits. We observed that this policy made the car speed up all the way to the left mountaintop to maximize its potential energy. The safest policy corresponds to simply staying in the bottom of the valley. The policies learnt via CEGAL for safety threshold p<sup>∗</sup> ranging from 60% to 50% not only have lower probability of violating the speed limits but also achieve comparable performance. As the safety threshold p<sup>∗</sup> decreases further, the agent becomes more conservative and it takes more time for the car to finish the task. For p<sup>∗</sup> = 20%, the agent never succeeds in reaching the top within 66 steps (Table 3).


**Table 3.** In the mountain-car environment, *lower* average steps mean better performance. The safest policy is synthesized via PRISM-games.

#### **7 Related Work**

A taxonomy of AI safety problems is given in [3] where the issues of misspecified objective or reward and insufficient or poorly curated training data are highlighted. There have been several attempts to address these issues from different angles. The problem of *safe exploration* is studied in [8,17]. In particular, the latter work proposes to add a safety constraint, which is evaluated by amount

<sup>6</sup> AL did not converge to an -close policy in 50 iterations in this case.

of damage, to the optimization problem so that the optimal policy can maximize the return without violating the limit on the expected damage. An obvious shortcoming of this approach is that actual failures will have to occur to properly assess damage.

Formal methods have been applied to the problem of AI safety. In [5], the authors propose to combine machine learning and reachability analysis for dynamical models to achieve high performance and guarantee safety. In this work, we focus on probabilistic models which are natural in many modern machine learning methods. In [20], the authors propose to use formal specification to synthesize a control policy for reinforcement learning. They consider formal specifications captured in Linear Temporal Logic, whereas we consider PCTL which matches better with the underlying probabilistic model. Recently, the problem of *safe reinforcement learning* was explored in [2] where a monitor (called shield) is used to enforce temporal logic properties either during the learning phase or execution phase of the reinforcement learning algorithm. The shield provides a list of safe actions each time the agent makes a decision so that the temporal property is preserved. In [11], the authors also propose an approach for controller synthesis in reinforcement learning. In this case, an SMT-solver is used to find a scheduler (policy) for the synchronous product of an MDP and a DTMC so that it satisfies both a probabilistic reachability property and an expected cost property. Another approach that leverages PCTL model checking is proposed in [16]. A so-called abstract Markov decision process (AMDP) model of the environment is first built and PCTL model checking is then used to check the satisfiability of safety specification. Our work is similar to these in spirit in the application of formal methods. However, while the concept of AL is closely related to reinforcement learning, an agent in the AL paradigm needs to learn a policy from demonstrations without knowing the reward function a priori.

A distinguishing characteristic of our method is the tight integration of formal verification with learning from data (apprenticeship learning in particular). Among imitation or apprenticeship learning methods, margin based algorithms [1,18,19] try to maximize the margin between the expert's policy and all learnt policies until the one with the smallest margin is produced. The apprenticeship learning algorithm proposed by Abbeel and Ng [1] was largely motivated by the support vector machine (SVM) in that features of expert demonstration is maximally separately from all features of all other candidate policies. Our algorithm makes use of this observation when using counterexamples to steer the policy search process. Recently, the idea of learning from failed demonstrations started to emerge. In [21], the authors propose an IRL algorithm that can learn from both successful and failed demonstrations. It is done by reformulating maximum entropy algorithm in [24] to find a policy that maximally deviates from the failed demonstrations while approaching the successful ones as much as possible. However, this entropy-based method requires obtaining many failed demonstrations and can be very costly in practice.

Finally, our approach is inspired by the work on formal inductive synthesis [10] and counterexample-guided inductive synthesis (CEGIS) [22]. These frameworks typically combine a constraint-based synthesizer with a verification oracle. In each iteration, the agent refines her hypothesis (i.e. generates a new candidate solution) based on counterexamples provided by the oracle. Our approach can be viewed as an extension of CEGIS where the objective is not just functional correctness but also meeting certain learning criteria.

### **8 Conclusion and Future Work**

We propose a counterexample-guided approach for combining probabilistic model checking with apprenticeship learning to ensure safety of the apprenticehsip learning outcome. Our approach makes novel use of counterexamples to steer the policy search process by reformulating the feature matching problem into a multi-objective optimization problem that additionally takes safety into account. Our experiments indicate that the proposed approach can guarantee safety and retain performance for a set of benchmarks including examples drawn from OpenAI Gym. In the future, we would like to explore other imitation or apprenticeship learning algorithms and extend our techniques to those settings.

**Acknowledgement.** This work is funded in part by the DARPA BRASS program under agreement number FA8750-16-C-0043 and NSF grant CCF-1646497.

#### **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

### **Deciding Probabilistic Bisimilarity Distance One for Labelled Markov Chains**

Qiyi Tang(B) and Franck van Breugel

DisCoVeri Group, York University, Toronto, Canada *{*qiyitang,franck*}*@eecs.yorku.ca

**Abstract.** Probabilistic bisimilarity is an equivalence relation that captures which states of a labelled Markov chain behave the same. Since this behavioural equivalence only identifies states that transition to states that behave exactly the same with exactly the same probability, this notion of equivalence is not robust. Probabilistic bisimilarity distances provide a quantitative generalization of probabilistic bisimilarity. The distance of states captures the similarity of their behaviour. The smaller the distance, the more alike the states behave. In particular, states are probabilistic bisimilar if and only if their distance is zero. This quantitative notion is robust in that small changes in the transition probabilities result in small changes in the distances.

During the last decade, several algorithms have been proposed to approximate and compute the probabilistic bisimilarity distances. The main result of this paper is an algorithm that decides distance one in O(n<sup>2</sup> + m<sup>2</sup>), where n is the number of states and m is the number of transitions of the labelled Markov chain. The algorithm is the key new ingredient of our algorithm to compute the distances. The state of the art algorithm can compute distances for labelled Markov chains up to 150 states. For one such labelled Markov chain, that algorithm takes more than 49 h. In contrast, our new algorithm only takes 13 ms. Furthermore, our algorithm can compute distances for labelled Markov chains with more than 10,000 states in less than 50 min.

**Keywords:** Labelled Markov chain · Probabilistic bisimilarity Probabilistic bisimilarity distance

#### **1 Introduction**

A *behavioural equivalence* captures which states of a model give rise to the same behaviour. Bisimilarity, due to Milner [22] and Park [25], is one of the best known behavioural equivalences. Verifying that an implementation satisfies a specification boils down to checking that the model of the implementation gives rise to the same behaviour as the model of the specification, that is, the models are behavioural equivalent (see [1, Chap. 3]).

In this paper, we focus on models of probabilistic systems. These models can capture randomized algorithms, probabilistic protocols, biological systems and c The Author(s) 2018

many other systems in which probabilities play a central role. In particular, we consider *labelled Markov chains*, that is, Markov chains the states of which are labelled.

The above example shows how the behaviour of rolling a die can be mimicked by flipping a coin, an example due to Knuth and Yao [19]. Six of the states are labelled with the values of a die and the other states are labelled zero. In this example, we are interested in the labels representing the value of a die. As the reader can easily verify, the states with these labels are each reached with probability <sup>1</sup> <sup>6</sup> from the initial, top most, state. In general, labels are used to identify particular states that have properties of interest. As a consequence, states with different labels are not behaviourally equivalent.

*Probabilistic bisimilarity*, due to Larsen and Skou [21], is a key behavioural equivalence for labelled Markov chains. As shown by Katoen et al. [16], minimizing a labelled Markov chain by identifying those states that are probabilistic bisimilar speeds up model checking. Probabilistic bisimilarity only identifies those states that behave exactly the same with exactly the same probability. If, for example, we replace the fair coin in the above example with a biased one, then none of the states labelled with zero in the original model with the fair coin are behaviourally equivalent to any of the states labelled with zero in the model with the biased coin. Behavioural equivalences like probabilistic bisimilarity rely on the transition probabilities and, as a result, are sensitive to minor changes of those probabilities. That is, such behavioural equivalences are not robust, as first observed by Giacalone et al. [12].

The *probabilistic bisimilarity distances* that we study in this paper were first defined by Desharnais et al. in [11]. Each pair of states of a labelled Markov chain is assigned a distance, a real number in the unit interval [0, 1]. This distance captures the similarity of the behaviour of the states. The smaller the distance, the more alike the states behave. In particular, states have distance zero if and only if they are probabilistic bisimilar. This provides a quantitative generalization of probabilistic bisimilarity that is robust in that small changes in the transition probabilities give rise to small changes in the distances. For example, we can model a biased die by using a biased coin instead of a fair coin in the above example. Let us assume that the odds of heads of the biased coin, that is, going to the left, is <sup>51</sup> <sup>100</sup> . A state labelled zero in the model of the fair die has a *non-trivial* distance, that is, a distance greater than zero and smaller than one, to the corresponding state in the model of the biased die. For example, the initial states have distance about 0.036. We refer the reader to [7] for a more detailed discussion of a similar example.

As we already mentioned earlier, behavioural equivalences can be used to verify that an implementation satisfies a specification. Similarly, the distances can be used to check how similar an implementation is to a specification. We also mentioned that probabilistic bisimilarity can be used to speed up model checking. The distances can be used in a similar way, by identifying those states that behave almost the same, that is, have a small distance (see [3,23,26]).

We focus in this paper on computing the probabilistic bisimilarity distances. In particular, we present a *decision procedure* for *distance one*. That is, we compute the set of pairs of states that have distance one. Recall that distance one is the maximal distance and, therefore, captures that states behave very differently. States with different labels have distance one. However, also states with the same label can have distance one, as the next example illustrates.

Instead of computing the set of state pairs that have distance one, we compute the complement, that is, the set of state pairs with distance smaller than one. Obviously, the set of state pairs with distance zero is included in this set. First, we decide distance zero. As we mentioned earlier, distance zero coincides with probabilistic bisimilarity. The first decision procedure for probabilistic bisimilarity was provided by Baier [4]. More efficient decision procedures were subsequently proposed by Derisavi et al. [10] and also by Valmari and Franceschinis [30]. The latter two both run in O(m log n), where n and m are the number of states and transitions of the labelled Markov chain. Subsequently, we use a traversal of a directed graph derived from the labelled Markov chain. This traversal takes O(n<sup>2</sup> + m<sup>2</sup>).

The decision procedures for distance zero and one can be used to compute or approximate probabilistic bisimilarity distances as indicated below.

Once we have computed the sets D<sup>0</sup> and D<sup>1</sup> of state pairs that have distance zero or one, we can easily compute the number of state pairs with non-trivial distances. If the number of non-trivial distances is small, then we can use the *simple policy iteration* (SPI) algorithm due to Bacci et al. [2] to compute those distances. Otherwise, we can either compute all distances smaller than a chosen ε > 0 or we can approximate the distances up to some chosen accuracy α > 0. In the former case, we first compute a query set Q of state pairs that contains all state pairs the distances of which are at most ε. Subsequently, we apply the *simple partial policy iteration* (SPPI) algorithm due to Bacci et al. [2] to compute the distances for all state pairs in Q. In the latter case, we start with a pair of distance functions, one being a lower-bound and the other being an upper-bound of the probabilistic bisimilarity distances, and iteratively improve the accuracy of those until they are α close. We call this new approximation algorithm *distance iteration* (DI) as it is similar in spirit to Bellman's value iteration [5].

Chen et al. [8] presented an algorithm to compute the distances by means of Khachiyan's ellipsoid method [17]. Though the algorithm is polynomial time, in practice it is not as efficient as the policy iteration algorithms (see the examples in [28, Sect. 8]). The state of the art algorithm to compute the probabilistic bisimilarity distances consists of two components: D<sup>0</sup> and SPI. To compare this algorithm with our new algorithm consisting of the components D0, D<sup>1</sup> and SPI, we implemented all the components in Java and ran both implementations on several labelled Markov chains. These labelled Markov chains model randomized algorithms and probabilistic protocols that are part of the distribution of probabilistic model checkers such as PRISM [20]. Whereas the original state of the art algorithm can handle labelled Markov chains with up to 150 states, our new algorithm can handle more than 10,000 states. Furthermore, for one such labelled Markov chain with 150 states, the original algorithm takes more than 49 h, whereas our new algorithm takes only 13 ms. Also, the new algorithm consisting of the components D0, D1, Q and SPPI to compute only small distances along with the new algorithm consisting of the components D0, D<sup>1</sup> and DI to approximate the distances give rise to even less execution times for a number of the labelled Markov chains.

The main contributions of this paper are


Furthermore, by means of experiments we have shown that these three new algorithms are very effective, improving significantly on the state of the art.

#### **2 Labelled Markov Chains and Probabilistic Bisimilarity Distances**

We start by reviewing the model of interest, labelled Markov chains, its most well known behavioural equivalence, probabilistic bisimilarity due to Larsen and Skou [21], and the probabilistic bisimilarity pseudometric due to Desharnais et al. [11]. We denote the set of rational probability distributions on a set S by Distr(S). For <sup>μ</sup> <sup>∈</sup> Distr(S), its support is defined by support(μ) = { <sup>s</sup> <sup>∈</sup> <sup>S</sup> <sup>|</sup> <sup>μ</sup>(s) <sup>&</sup>gt; <sup>0</sup> }. Instead of <sup>S</sup> <sup>×</sup> <sup>S</sup>, we often write <sup>S</sup>2.

**Definition 1.** *A labelled Markov chain is a tuple* S, L, τ, *consisting of*


For the remainder of this section, we fix such a labelled Markov chain S, L, τ, .

**Definition 2.** *Let* <sup>μ</sup>*,* <sup>ν</sup> <sup>∈</sup> Distr(S)*. The set* <sup>Ω</sup>(μ, ν) *of couplings of* <sup>μ</sup> *and* <sup>ν</sup> *is defined by*

$$\Omega(\mu, \nu) = \left\{ \omega \in \text{Distr}(S^2) \; \middle| \; \begin{array}{l} \forall s \in S: \sum\_{t \in S} \omega(s, t) = \mu(s) \land \\ \forall t \in S: \sum\_{s \in S} \omega(s, t) = \nu(t) \end{array} \right\}.$$

Note that <sup>ω</sup> <sup>∈</sup> <sup>Ω</sup>(μ, ν) is a joint probability distribution with marginals <sup>μ</sup> and ν. The following proposition will be used to prove Proposition 5.

**Proposition 1.** *For all* <sup>μ</sup>*,* <sup>ν</sup> <sup>∈</sup> Distr(S) *and* <sup>X</sup> <sup>⊆</sup> <sup>S</sup>2*,*

<sup>∀</sup><sup>ω</sup> <sup>∈</sup> <sup>Ω</sup>(μ, ν) : support(ω) <sup>⊆</sup> <sup>X</sup> *if and only if* support(μ) <sup>×</sup> support(ν) <sup>⊆</sup> X.

**Definition 3.** *An equivalence relation* <sup>R</sup> <sup>⊆</sup> <sup>S</sup><sup>2</sup> *is a probabilistic bisimulation if for all* (s, t) <sup>∈</sup> <sup>R</sup>*,* (s) = (t) *and there exists* <sup>ω</sup> <sup>∈</sup> <sup>Ω</sup>(<sup>τ</sup> (s), τ (t)) *such that* support(ω) <sup>⊆</sup> <sup>R</sup>*. Probabilistic bisimilarity, denoted* <sup>∼</sup>*, is the largest probabilistic bisimulation.*

The probabilistic bisimilarity pseudometric of Desharnais et al. [11] maps each pair of states of a labelled Markov chain to a distance, an element of the unit interval [0, 1]. Hence, the pseudometric is a function from S<sup>2</sup> to [0, 1], that is, an element of [0, 1]<sup>S</sup><sup>2</sup> . As we will discuss below, it can be defined as a fixed point of the following function.

**Definition 4.** *The function* Δ : [0, 1]<sup>S</sup><sup>2</sup> <sup>→</sup> [0, 1]<sup>S</sup><sup>2</sup> *is defined by*

$$\Delta(d)(s,t) = \begin{cases} 1 & \text{if } \ell(s) \neq \ell(t) \\ \min\_{\omega \in \Omega(\tau(s), \tau(t))} \sum\_{u,v \in S} \omega(u,v) \, d(u,v) \, \text{otherwise} \end{cases}$$

Since a concave function on a convex polytope attains its minimum (see [18, p. 260]), the above minimum exists. We will use this fact in Proposition 4, one of the key technical results in this paper. We endow the set [0, 1]S<sup>2</sup> of functions from <sup>S</sup><sup>2</sup> to [0, 1] with the following partial order: <sup>d</sup> <sup>e</sup> if <sup>d</sup>(s, t) <sup>≤</sup> <sup>e</sup>(s, t) for all <sup>s</sup>, <sup>t</sup> <sup>∈</sup> <sup>S</sup>. The set [0, 1]S<sup>2</sup> together with the order form a complete lattice (see [9, Chap. 2]). The function Δ is monotone (see [6, Sect. 3]). According to the Knaster-Tarski fixed point theorem [29, Theorem 1], a monotone function on a complete lattice has a least fixed point. Hence, Δ has a least fixed point, which we denote by *μ*(Δ). This fixed point assigns to each pair of states their probabilistic bisimilarity distance.

Given that *μ*(Δ) captures the probabilistic bisimilarity distances, we define the following sets.

$$\begin{aligned} D\_0 &= \{ (s, t) \in S^2 \mid \mu(\Delta)(s, t) = 0 \} \\ D\_1 &= \{ (s, t) \in S^2 \mid \mu(\Delta)(s, t) = 1 \} \end{aligned}$$

The probabilistic bisimilarity pseudometric *μ*(Δ) provides a quantitative generalization of probabilistic bisimilarity as captured by the following result by Desharnais et al. [11, Theorem 1].

**Theorem 1.** <sup>D</sup><sup>0</sup> <sup>=</sup> { (s, t) <sup>∈</sup> <sup>S</sup><sup>2</sup> <sup>|</sup> <sup>s</sup> <sup>∼</sup> <sup>t</sup> }*.*

#### **3 Distance One**

We concluded the previous section with the characterization of D<sup>0</sup> as the set of state pairs that are probabilistic bisimilar. In this section we present a characterization of D<sup>1</sup> as a fixed point of the function introduced in Definition 5.

Let us consider the case that the probabilistic bisimilarity distance of states s and t is one, that is, *μ*(Δ)(s, t) = 1. Then Δ(*μ*(Δ))(s, t) = 1. From the definition of <sup>Δ</sup>, we can conclude that either (s) = (t), or for all couplings <sup>ω</sup> <sup>∈</sup> <sup>Ω</sup>(<sup>τ</sup> (s), τ (t)) we have support(ω) <sup>⊆</sup> <sup>D</sup>1.

We partition the set S<sup>2</sup> of state pairs into

$$\begin{aligned} S\_0^2 &= \{ (s, t) \in S^2 \mid s \sim t \} \\ S\_1^2 &= \{ (s, t) \in S^2 \mid \ell(s) \neq \ell(t) \} \\ S\_?^2 &= S^2 \backslash (S\_0^2 \cup S\_1^2) \end{aligned}$$

Hence, if *<sup>μ</sup>*(Δ)(s, t) = 1, then either (s, t) <sup>∈</sup> <sup>S</sup><sup>2</sup> <sup>1</sup> , or (s, t) <sup>∈</sup> <sup>S</sup><sup>2</sup> ? and for all couplings <sup>ω</sup> <sup>∈</sup> <sup>Ω</sup>(<sup>τ</sup> (s), τ (t)) we have support(ω) <sup>⊆</sup> <sup>D</sup>1. This leads us to the following function.

**Definition 5.** *The function* Γ : 2<sup>S</sup><sup>2</sup> <sup>→</sup> <sup>2</sup><sup>S</sup><sup>2</sup> *is defined by*

$$\Gamma(X) = S\_1^2 \cup \{ (s, t) \in S\_?^2 \mid \forall \omega \in \Omega(\tau(s), \tau(t)): \text{support}(\omega) \subseteq X \}. $$

**Proposition 2.** *The function* Γ *is monotone.*

Since the set 2S<sup>2</sup> of subsets of <sup>S</sup><sup>2</sup> endowed with the order <sup>⊆</sup> is a complete lattice (see [9, Example 2.6(2)]) and the function Γ is monotone, we can conclude from the Knaster-Tarski fixed point theorem that Γ has a greatest fixed point, which we denote by *ν*(Γ). Next, we show that D<sup>1</sup> is a fixed point of Γ.

#### **Proposition 3.** D<sup>1</sup> = Γ(D1)*.*

Since we have already seen that D<sup>1</sup> is a fixed point of Γ, we have that <sup>D</sup><sup>1</sup> <sup>⊆</sup> *<sup>ν</sup>*(Γ). To conclude that <sup>D</sup><sup>1</sup> is the greatest fixed point of <sup>Γ</sup>, it remains to show that *<sup>ν</sup>*(Γ) <sup>⊆</sup> <sup>D</sup>1, which is equivalent to the following.

**Proposition 4.** *<sup>ν</sup>*(Γ) \ <sup>D</sup><sup>1</sup> <sup>=</sup> <sup>∅</sup>*.*

*Proof.* Towards a contradiction, assume that *<sup>ν</sup>*(Γ) \ <sup>D</sup><sup>1</sup> = ∅. Let

Since *<sup>ν</sup>*(Γ) \ <sup>D</sup><sup>1</sup> <sup>=</sup> <sup>∅</sup>, we have that <sup>M</sup> = ∅. Furthermore,

$$M \subseteq \nu(\Gamma) \backslash D\_1. \tag{1}$$

Since *<sup>ν</sup>*(Γ) \ <sup>D</sup><sup>1</sup> <sup>⊆</sup> *<sup>ν</sup>*(Γ), we have

$$M \subseteq \nu(\varGamma) = \varGamma(\nu(\varGamma)) \subseteq S\_1^2 \cup S\_?^2. \tag{2}$$

For all (s, t) <sup>∈</sup> <sup>M</sup>,

$$\begin{aligned} (s,t) &\in \nu(\varGamma) \land (s,t) \notin D\_1 \quad [(1)] \\ &\Rightarrow (s,t) \in \Gamma(\nu(\varGamma)) \land (s,t) \notin S\_1^2 \\ &\Rightarrow \forall \omega \in \Omega(\tau(s), \tau(t)) : \operatorname{support}(\omega) \subseteq \nu(\varGamma). \end{aligned}$$

For each (s, t) <sup>∈</sup> <sup>M</sup>, let

$$\omega\_{s,t} = \underset{\omega \in \Omega(\tau(s), \tau(t))}{\text{argmin}} \sum\_{u,v \in S} \omega(u,v) \,\mu(\Delta)(u,v). \tag{4}$$

We distinguish the following two cases.

– Assume that there exists (s, t) <sup>∈</sup> <sup>M</sup> such that support(ωs,t) <sup>∩</sup> <sup>D</sup><sup>1</sup> = ∅. Let

$$p = \sum\_{(u,v)\in\nu(\varGamma)\cap D\_1} \omega\_{s,t}(u,v).$$

By (3), we have that support(ωs,t) <sup>⊆</sup> *<sup>ν</sup>*(Γ). Since support(ωs,t) <sup>∩</sup> <sup>D</sup><sup>1</sup> = ∅ by assumption, we can conclude that p > 0. Again using the fact that support(ωs,t) <sup>⊆</sup> *<sup>ν</sup>*(Γ), we have that

$$\sum\_{(u,v)\in\nu(I)\backslash D\_1} \omega\_{s,t}(u,v) = 1 - p.s.\tag{5}$$

Furthermore,

$$\begin{split} m &= \mu(\Delta)(s,t) \\ &= \Delta(\mu(\Delta))(s,t) \\ &= \min\_{\omega \in \Omega(\Gamma(s),\tau(t))} \sum\_{u,v \in S} \omega(u,v)\,\mu(\Delta)(u,v) \\ &= \sum\_{u,v \in S} \omega\_{s,t}(u,v)\,\mu(\Delta)(u,v) \quad [(4)] \\ &= \sum\_{(u,v) \in \nu(\varGamma)} \omega\_{s,t}(u,v)\,\mu(\Delta)(u,v) \quad [(3)] \\ &= \sum\_{(u,v) \in \nu(\varGamma) \cap D\_{1}} \omega\_{s,t}(u,v)\,\mu(\Delta)(u,v) + \sum\_{(u,v) \in \nu(\varGamma) \backslash D\_{1}} \omega\_{s,t}(u,v)\,\mu(\Delta)(u,v) \\ &= p + \sum\_{(u,v) \in \nu(\varGamma) \backslash D\_{1}} \omega\_{s,t}(u,v)\,\mu(\Delta)(u,v) \\ &\ge p + (1-p)m. \end{split}$$

The last step follows from (5) and the fact that *<sup>μ</sup>*(Δ)(u, v) <sup>≥</sup> <sup>m</sup> for all (u, v) <sup>∈</sup> *<sup>ν</sup>*(Γ) \ <sup>D</sup>1. From the facts that p > 0 and <sup>m</sup> <sup>≥</sup> <sup>p</sup> + (1 <sup>−</sup> <sup>p</sup>)<sup>m</sup> we can conclude that <sup>m</sup> <sup>≥</sup> 1. This contradicts (1).

– Otherwise, support(ωs,t) <sup>∩</sup> <sup>D</sup><sup>1</sup> <sup>=</sup> <sup>∅</sup> for all (s, t) <sup>∈</sup> <sup>M</sup>. Next, we will show that M is a probabilistic bisimulation under this assumption. From the fact that M is a probabilistic bisimulation, we can conclude from Theorem 1 that *<sup>μ</sup>*(Δ)(s, t) = 0 for all (s, t) <sup>∈</sup> <sup>M</sup>. Hence, since <sup>M</sup> <sup>=</sup> <sup>∅</sup> we have that <sup>M</sup> <sup>∩</sup>S<sup>2</sup> 0 = ∅ which contradicts (2).

Next, we prove that <sup>M</sup> is a probabilistic bisimulation. Let (s, t) <sup>∈</sup> <sup>M</sup>. Since <sup>M</sup> <sup>⊆</sup> *<sup>ν</sup>*(Γ) \ <sup>D</sup><sup>1</sup> by (1), we have that (s, t) <sup>∈</sup> <sup>D</sup><sup>1</sup> and, hence, <sup>Δ</sup>(*μ*(Δ))(s, t) = *μ*(Δ)(s, t) < 1. From the definition of Δ, we can conclude that (s) = (t). Since

$$\begin{aligned} m &= \mu(\Delta)(s, t) \\ &= \sum\_{(u, v) \in \nu(I) \backslash D\_1} \omega\_{s, t}(u, v) \,\mu(\Delta)(u, v) \quad \text{[as above]} \end{aligned}$$

and *<sup>μ</sup>*(Δ)(u, v) <sup>≥</sup> <sup>m</sup> for all (u, v) <sup>∈</sup> *<sup>ν</sup>*(Γ) \ <sup>D</sup>1, we can conclude that *<sup>μ</sup>*(Δ)(u, v) = <sup>m</sup> for all (u, v) <sup>∈</sup> support(ωs,t). Hence, support(ωs,t) <sup>⊆</sup> <sup>M</sup>. Therefore, <sup>M</sup> is a probabilistic bisimulation.

#### **Theorem 2.** D<sup>1</sup> = *ν*(Γ)*.*

*Proof.* Immediate consequence of Proposition 3 and 4.

We have shown that D<sup>1</sup> can be characterized as the greatest fixed point of Γ. Next, we will show that D<sup>1</sup> can be decided in polynomial time.

**Theorem 3.** *Distance one can be decided in* O(n<sup>2</sup> + m<sup>2</sup>)*.*

*Proof.* As we will show in Theorem 5, distance smaller than one can be decided in <sup>O</sup>(n<sup>2</sup> <sup>+</sup> <sup>m</sup><sup>2</sup>). Hence, distance one can be decided in <sup>O</sup>(n<sup>2</sup> <sup>+</sup> <sup>m</sup><sup>2</sup>) as well.

#### **4 Distance Smaller Than One**

To compute the set of state pairs which have distance one, we can first compute the set of state pairs which have distance less than one. The latter set we denote by D<<sup>1</sup>. We can then obtain D<sup>1</sup> by taking the complement of D<<sup>1</sup>. As we will discuss below, D<<sup>1</sup> can be characterized as the least fixed point of the following function.

**Definition 6.** *The function* Γ: 2<sup>S</sup><sup>2</sup> <sup>→</sup> <sup>2</sup><sup>S</sup><sup>2</sup> *is defined by*

$$
\Pi(X) = S^2 \backslash \Gamma(S^2 \backslash X).
$$

The next theorem follows from Theorem 2.

#### **Theorem 4.** D<<sup>1</sup> = *μ*( Γ)*.*

Next, we show that the computation of D<<sup>1</sup> can be formulated as a reachability problem on a directed graph which is induced by the labelled Markov chain. Thus, we can use standard search algorithms, for example, breadth-first search, on the induced graph.

Next, we present the graph induced by the labelled Markov chain.

**Definition 7.** *The directed graph* G = (V,E) *is defined by*

$$\begin{array}{l} V = S\_0^2 \cup S\_? \\ E = \left\{ \langle (u,v), (s,t) \rangle \mid \tau(s)(u) > 0 \land \tau(t)(v) > 0 \right\} \end{array}$$

We are left to show that in the graph G defined above, a vertex (s, t) is reachable from some vertex in S<sup>2</sup> <sup>0</sup> if and only if the state pair (s, t) in the labelled Markov chain has distance less than one.

As we have discussed earlier, if a state pair (s, t) has distance one, either s and <sup>t</sup> have different labels, or for all couplings <sup>ω</sup> <sup>∈</sup> <sup>Ω</sup>(<sup>τ</sup> (s), τ (t)) we have that support(ω) <sup>⊆</sup> <sup>D</sup>1. To avoid the universal quantification over couplings, we will use Proposition 1 in the proof of following proposition.

**Proposition 5.** *μ*( Γ) = { (s, t) <sup>|</sup> (s, t) *is reachable from some* (u, v) <sup>∈</sup> <sup>S</sup><sup>2</sup> <sup>0</sup> }*.*

**Theorem 5.** *Distance smaller than one can be decided in* O(n<sup>2</sup> + m<sup>2</sup>)*.*

*Proof.* Distance smaller than one can be decided as follows.


By Theorem 4 and Proposition 5, we have that s and t have distance smaller than one if and only if (s, t) is reachable in the directed graph G from some (u, v) such that u and v have distance zero. These reachable state pairs can be computed using breadth-first search, with the queue initially containing S<sup>2</sup> 0 .

Distance zero, that is, probabilistic bisimilarity, can be decided in O(m log n) as shown by Derisavi et al. in [10]. The directed graph G has n<sup>2</sup> vertices and m<sup>2</sup> edges. Hence, breadth-first search takes <sup>O</sup>(n<sup>2</sup> <sup>+</sup> <sup>m</sup><sup>2</sup>).

#### **5 Number of Non-trivial Distances**

As we have already discussed earlier, distance zero captures that states behave exactly the same, that is, they are probabilistic bisimilar, and distance one indicates that states behave very differently. The remaining distances, that is, those greater than zero and smaller than one, we call non-trivial. Being able to determine quickly the number of non-trivial distances of a labelled Markov chain allows us to decide whether computing all these non-trivial distances (using some policy iteration algorithm) is feasible.

To determine the number of non-trivial distances of a labelled Markov chain, we use the following algorithm.


As first proved by Baier [4], distance zero, that is, probabilistic bisimilarity, can be decided in polynomial time. As we proved in Theorem 3, distance one can be decided in polynomial time as well. Hence, we can compute the number of non-trivial distances in polynomial time.

To decide distance zero, we implemented the algorithm to decide probabilistic bisimilarity due to Derisavi et al. [10] in Java. We also implemented our algorithm to decide distance one, described in the proof of Theorems 3 and 5.

We applied our implementation to labelled Markov chains that model randomized algorithms and probabilistic protocols. These labelled Markov chains have been obtained from the verification tool PRISM [20]. We compute the number of non-trivial distances for two models: the randomized self-stabilising algorithm due to Herman [14] and the bounded retransmission protocol by Helmink et al. [13].

For the randomized self-stabilising algorithm, the size of the labelled Markov chain grows exponentially in the numbers of processes, N. The results for the randomized self-stabilising algorithm are shown in the table below. As we can see from the table, for systems up to 128 states, the algorithm runs for less than a second. For the system with 512 states, the algorithm terminates within seven minutes. For the case N = 3, there are only 12 non-trivial distances. The size is so small that we can easily compute all the non-trivial distances. Section 6 will use the simple policy iteration algorithm as the next step to compute them. The same applies to the case N = 5. For N = 7 or 9, the number of non-trivial distances is around 11,000 and 200,000, respectively. This makes computing all of them infeasible. Thus, instead of computing all of them, we need to find alternative ways to handle systems with a large number of non-trivial distances. We will discuss two alternative ways in Sects. 7 and 8. Moreover, in this example, as <sup>|</sup>D1<sup>|</sup> <sup>=</sup> <sup>|</sup>S<sup>2</sup> <sup>1</sup> |, we know that all the state pairs with distance one are those that have different labels.


In the bounded retransmission protocol, there are two parameters: N denotes the number of chunks and M the maximum allowed number of retransmissions of each chunk. The results are shown in the table below. The algorithm can handle systems up to 3,526 states within 11 min. In this example, there are no non-trivial distances. As a consequence, deciding distance zero and one suffices to compute all the distances in this case.


#### **6 All Distances**

To compute all distances of a labelled Markov chain, we augment the existing state of the art algorithm, which is based on algorithms due to Derisavi et al. [10] (step 1) and Bacci et al. [2] (step 3), by incorporating our decision procedure (step 2) as follows.


Given that we not only decide distance zero, but also distance one, before running simple policy iteration, the correctness of the simple policy iteration algorithm in the augmented setting needs an adjusted proof.

As we already discussed in the previous section, step 1 and 2 are polynomial time. However, step 3 may take at least exponential time in the worst case, as we have shown in [27]. Hence, the overall algorithm is exponential time.

The first example we consider here is the synchronous leader election protocol of Itai and Rodeh [15] which is taken from PRISM. The protocol takes the number of processors, N, and a constant K as parameters. We compare the running time of our new algorithm with the state of the art algorithm, that combines algorithms due to Derisavi et al. and due to Bacci et al. The results are shown in the table below. In this protocol, the number of non-trivial distances is zero. Thus, our new algorithm terminates without running step 3 which is the simple policy iteration algorithm. On the other hand, the original simple policy iteration algorithm computes the distances of all the elements in the set <sup>D</sup><sup>1</sup> \S<sup>2</sup> 1 , the size of which is huge as can be seen from the last two columns of the table.


The simple policy iteration algorithm can only handle a limited number of states. For the labelled Markov chain with 26 states (N = 3 and K = 2) the simple policy iteration algorithm takes four seconds, while our new algorithm takes one millisecond. The speed-up is more than 4,000 times. For the labelled Markov chain with 61 states (N = 4 and K = 2), the simple policy iteration algorithm runs in 812 s, while our new algorithm takes three milliseconds. The speed-up of the new algorithm is 30,000 times. The biggest system the simple policy iteration algorithm can handle is the one with 147 states (N = 3 and K = 4) and it takes more than 49 h. In contrast, our new algorithm terminates within 13 ms. That makes the new algorithm seven orders of magnitude faster than the state of the art algorithm. This example also shows that the new algorithm can handle systems with at least 12,400 states.

In the second example, we model two dies, one using a fair coin and the other one using a biased coin. The goal is to compute the probabilistic bisimilarity distance between these two dies. An implementation of the die algorithm is part of PRISM. The resulting labelled Markov chain has 20 states.

As there are only 30 non-trivial distances, we run the simple policy iteration algorithm as step 3. The new algorithm is about 46 times faster than the original algorithm.


#### **7 Small Distances**

As we have discussed in Sect. 5, for systems of which the number of non-trivial distances is so large that computing all of them is infeasible, we have to find alternative ways. In practice, as we only identify the state pairs with small distances, we can cut down the number of non-trivial distances by only computing those with small distances.

To compute the non-trivial distances smaller than a positive number, ε, we use the following algorithm.


$$Q = \{ (s, t) \in S^2 \mid (D\_0 \cup D\_1) \mid \Delta(d)(s, t) \le \varepsilon \} $$

where

$$d(s,t) = \begin{cases} 1 \text{ if } (s,t) \in D\_1 \\ 0 \text{ otherwise} \end{cases}$$

#### 4. Simple partial policy iteration for Q.

The first two steps remain the same. In step 3, we compute a query set Q that contains all state pairs with distances no greater than ε, as shown in Proposition 6. In step 4, we use this set as the query set to run the simple partial policy iteration algorithm by Bacci et al. [2].

**Proposition 6.** *Let* <sup>d</sup> *be the distance function defined in step 3. For all* (s, t) <sup>∈</sup> <sup>S</sup><sup>2</sup> \ (D<sup>0</sup> <sup>∪</sup> <sup>D</sup>1)*, if <sup>μ</sup>*(Δ)(s, t) <sup>≤</sup> <sup>ε</sup>*, then* <sup>Δ</sup>(d)(s, t) <sup>≤</sup> <sup>ε</sup>*.*

Given that we not only decide distance zero, but also distance one, before running simple partial policy iteration, the correctness of the simple partial policy iteration algorithm in the augmented setting needs an adjusted proof.

As we have seen before, step 1 and 2 take polynomial time. In step 3, computing Δ(d) corresponds to solving a minimum cost network flow problem. Such a problem can be solved in polynomial time using, for example, Orlin's network simplex algorithm [24]. As we have shown in [28], step 4 takes at least exponential time in the worst case. Therefore, the overall algorithm is exponential time.

We consider the randomized quicksort algorithm, an implementation of which is part of jpf-probabilistic [31]. The input of the algorithm is the list to be sorted. The list of size 6 gives rise to a labelled Markov chain with 82 states. We compare the running time of the new algorithm for small distances (D0+D1+Q + SPPI) to the original algorithm (D<sup>0</sup> + SPI) and the new algorithm presented in Sect. 6 (D<sup>0</sup> + D<sup>1</sup> + SPI). The original algorithm (D<sup>0</sup> + SPI) takes about 14 h, the new algorithm which incorporates the decision procedure of distance one takes less than 7 h. For ε = 0.1, the new algorithm for small distances takes 57 min. This makes it about 7 times faster than the algorithm presented in Sect. 6 and about 15 times faster than the original simple policy iteration algorithm. For ε = 0.01, the new algorithm for small distances takes even less time, namely 41 min. As can be seen in the table below, the total number of non-trivial distances is 2,300. The simple partial policy iteration algorithm starts with the query set Q but may have to compute the distances of other state pairs as well. The total number of state pairs considered by the simple partial policy iteration algorithm can be found in the column labelled Total.


#### **8 Approximation Algorithm**

We propose another solution to deal with a large number of non-trivial distances by approximating the distances rather than computing the exact values. To approximate the distances such that the approximate values differ from the exact ones by at most α, a positive number, we use the following algorithm.


$$\begin{aligned} \text{3. } \begin{cases} \text{3. } l(s,t) = \begin{cases} 1 \text{ if } (s,t) \in D\_1 \\ 0 \text{ otherwise} \end{cases} \\ u(s,t) = \begin{cases} 0 \text{ if } (s,t) \in D\_0 \\ 1 \text{ otherwise} \end{cases} \\ \text{re-peak} \\ \begin{cases} \text{or} \\ \text{for} \\ \end{cases} \text{each } (s,t) \in S^2 \\ \text{if} \quad l(s,t) \neq u(s,t) \\ l(s,t) = \Delta(l)(s,t) \\ u(s,t) = \Delta(u)(s,t) \\ \text{until } ||l-u|| \le \alpha \end{cases} \end{aligned}$$

Again, the first two steps remain the same. Step 3 contains the new approximation algorithm called *distance iteration* (DI). In this step, we define two distance functions, a lower-bound l and an upper-bound u. We repeatedly apply Δ to these two functions until the difference of the non-trivial distances in these two functions is smaller than the threshold α. For each state pair we end up with an interval of at most size α in which their distance lies. To prove the algorithm correct, we modify the function Δ defining the probabilistic bisimilarity distances slightly as follows.

**Definition 8.** *The function* <sup>Δ</sup><sup>0</sup> : [0, 1]<sup>S</sup><sup>2</sup> <sup>→</sup> [0, 1]<sup>S</sup><sup>2</sup> *is defined by*

$$\Delta\_0(d)(s,t) = \begin{cases} 0 & if \,(s,t) \in D\_0\\ \Delta(d)(s,t) \,\,otherwise \end{cases}$$

Some properties of Δ0, which are key to the correctness proof of the above algorithm, are collected in the following theorem.

#### **Theorem 6.**

*(a) The function* Δ<sup>0</sup> *is monotone. (b) The function* Δ<sup>0</sup> *is nonexpansive. (c) μ*(Δ0) = *μ*(Δ)*. (d) μ*(Δ0) = *ν*(Δ0)*. (e) <sup>μ</sup>*(Δ0) = sup<sup>m</sup>∈<sup>N</sup> <sup>Δ</sup><sup>m</sup> <sup>0</sup> (d0)*, where* <sup>d</sup>0(s, t)=0 *for all* s, t <sup>∈</sup> <sup>S</sup>*. (f ) <sup>ν</sup>*(Δ0) = inf<sup>n</sup>∈<sup>N</sup> <sup>Δ</sup><sup>n</sup> <sup>0</sup> (d1)*, where* <sup>d</sup>1(s, t)=1 *for all* s, t <sup>∈</sup> <sup>S</sup>*.*

Let us use randomized quicksort introduced in Sect. 7 and the randomized self-stabilising algorithm due to Herman [14] introduced in Sect. 5 as examples. Recall that for the randomized self-stabilising algorithm, when N = 7, the number of non-trivial distances is 11,032, which we are not able to handle using the simple policy iteration algorithm. We apply the approximation algorithm to this model and the randomized quicksort example with 82 states and present the results below. The accuracy α is set to be 0.01.

The approximation algorithm for randomized quicksort runs for about 14 min, which is about 3 to 4 times faster than the algorithm for small distances in Sect. 7. For the randomized self-stabilising algorithm with 128 states, the approximation algorithm terminates in about 54 h. Although the number of non-trivial distances for the randomized self-stabilising algorithm is about 5 times of that of the randomized quicksort, the running time is more than 200 times slower. It is unknown whether this approximation algorithm has exponential running time.


#### **9 Conclusion**

In this paper, we have presented a decision procedure for probabilistic bisimilarity distance one. This decision procedure provides the basis for three new algorithms to compute and approximate the probabilistic bisimilarity distances of a labelled Markov chain. The first algorithm decides distance zero, then decides distance one, and finally uses simple policy iteration to compute the remaining distances. As shown experimentally, this new algorithm significantly improves the state of the art algorithm that only decides distance zero and then uses simple policy iteration. The second algorithm computes all probabilistic bisimilarity distances that are smaller than some given upper bound, by deciding distance zero, deciding distance one, computing a query set, and running simple partial policy iteration for that query set. This second algorithm can handle labelled Markov chains that have considerably more non-trivial distances than our first algorithm. The third algorithm approximates the probabilistic bisimilarity distances up to a given accuracy, deciding distance zero, deciding distance one and running distance iteration. Also this third algorithm can handle labelled Markov chains that have considerably more non-trivial distances than our first algorithm. Whereas we know that the first two algorithms take at least exponential time in the worst case, the analysis of the running time of the third algorithm has not yet been determined. Moreover, if we are only interested in the probabilistic bisimilarity distances for a few state pairs, with pre-computation of distance zero and one we can exclude the state pairs with trivial distances. We can add the remaining state pairs to a query set and run simple partial policy iteration to get the distances. Alternatively, we can modify the distance iteration algorithm to approximate the distances for the predefined state pairs. The details of these new algorithms will be studied in the future.

**Acknowledgements.** The authors would like to thank Daniela Petrisan, Eric Ruppert and Dana Scott for discussions related to this research. The authors are also grateful to the referees for their constructive feedback.

#### **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

#### Author Index

Abate, Alessandro I-270 Akshay, S. I-251 Albarghouthi, Aws I-327 Albert, Elvira II-392 Anderson, Greg I-407 Argyros, George I-427 Arndt, Hannah II-3

Backes, John II-20 Bansal, Suguman I-367, II-99 Bardin, Sébastien II-294 Barrett, Clark II-236 Bartocci, Ezio I-449, I-547 Bauer, Matthew S. II-117 Becchi, Anna I-230 Berzish, Murphy II-45 Biere, Armin I-587 Bloem, Roderick I-547 Blondin, Michael I-604 Blotsky, Dmitry II-45 Bonichon, Richard II-294 Bønneland, Frederik M. I-527 Bouajjani, Ahmed II-336, II-372 Büning, Julian II-447

Češka, Milan I-612 Chadha, Rohit II-117 Chakraborty, Supratik I-251 Chatterjee, Krishnendu II-178 Chaudhuri, Swarat II-99 Chen, Taolue II-487 Cheval, Vincent II-28 Chudnov, Andrey II-430 Collins, Nathan II-413, II-430 Cook, Byron I-38, II-430, II-467 Cordeiro, Lucas I-183 Coti, Camille II-354 Cousot, Patrick II-75

D'Antoni, Loris I-386, I-427 David, Cristina I-270 Dillig, Isil I-407 Dodds, Joey II-430 Dohrau, Jérôme II-55

Dreossi, Tommaso I-3 Dureja, Rohit II-37

Eilers, Marco I-596, II-12 Emmi, Michael I-487 Enea, Constantin I-487, II-336, II-372 Esparza, Javier I-604

Fan, Chuchu I-347 Farinier, Benjamin II-294 Fedyukovich, Grigory I-124, I-164 Feng, Yijun I-507 Finkbeiner, Bernd I-144, I-289 Frehse, Goran I-468 Fremont, Daniel J. I-307

Gacek, Andrew II-20 Ganesh, Vijay II-45, II-275 Gao, Pengfei II-157 Gao, Sicun II-219 Ghassabani, Elaheh II-20 Giacobazzi, Roberto II-75 Giacobbe, Mirco I-468 Goel, Shubham I-251 Gómez-Zamalloa, Miguel II-392 Goubault, Eric II-523 Grishchenko, Ilya I-51 Gu, Ronghui II-317 Gupta, Aarti I-124, I-164, II-136

Hahn, Christopher I-144, I-289 Hassan, Mostafa II-12 He, Jinlong II-487 Henzinger, Monika II-178 Henzinger, Thomas A. I-449, I-468 Hsu, Justin I-327 Hu, Qinheping I-386 Huffman, Brian II-430

#### Isabel, Miguel II-392

Jaax, Stefan I-604 Jansen, Christina II-3 Jensen, Peter Gjøl I-527 Jha, Somesh I-3 Ji, Kailiang II-372

Kabir, Ifaz II-45 Katoen, Joost-Pieter I-507, I-643, II-3 Kelmendi, Edon I-623 Kesseli, Pascal I-183, I-270 Khazem, Kareem II-467 Kolokolova, Antonina II-275 Kong, Hui I-449 Kong, Soonho II-219 Kragl, Bernhard I-79 Krämer, Julia I-623 Kremer, Steve II-28 Křetínský, Jan I-567, I-623 Kroening, Daniel I-183, I-270, II-467 Kulal, Sumith I-251

Larsen, Kim Guldstrand I-527 Li, Haokun I-507 Li, Jianwen II-37 Li, Wenchao I-662 Loitzenbauer, Veronika II-178 Lukert, Philip I-289 Luttenberger, Michael I-578

MacCárthaigh, Colm II-430 Maffei, Matteo I-51 Magill, Stephen II-430 Malik, Sharad II-136 Matheja, Christoph II-3 Mathur, Umang I-347 Matyáš, Jiří I-612 McMillan, Kenneth L. I-191, I-407 Meggendorfer, Tobias I-567 Mertens, Eric II-430 Meyer, Philipp J. I-578 Mitra, Sayan I-347 Mora, Federico II-45 Mrazek, Vojtech I-612 Mullen, Eric II-430 Müller, Peter I-596, II-12, II-55 Münger, Severin II-55 Muñiz, Marco I-527 Mutluergil, Suha Orhun II-336

Namjoshi, Kedar S. I-367 Nguyen, Huyen T. T. II-354 Nickovic, Dejan I-547

Noll, Thomas II-3, II-447 Oraee, Simin II-178 Petrucci, Laure II-354 Pick, Lauren I-164 Pike, Lee II-413 Polgreen, Elizabeth I-270 Potet, Marie-Laure II-294 Prasad Sistla, A. II-117 Preiner, Mathias I-587, II-236 Pu, Geguang II-37 Püschel, Markus I-211 Putot, Sylvie II-523

Niemetz, Aina I-587, II-236

Qadeer, Shaz I-79, II-372 Quatmann, Tim I-643

Rabe, Markus N. II-256 Rakotonirina, Itsaka II-28 Ranzato, Francesco II-75 Rasmussen, Cameron II-256 Reynolds, Andrew II-236 Robere, Robert II-275 Rodríguez, César II-354 Roeck, Franz I-547 Rozier, Kristin Yvonne II-37 Rubio, Albert II-392

Sa'ar, Yaniv I-367 Sahlmann, Lorenz II-523 Satake, Yuki I-105 Schemmel, Daniel II-447 Schneidewind, Clara I-51 Schrammel, Peter I-183 Sekanina, Lukas I-612 Seshia, Sanjit A. I-3, I-307, II-256 Shah, Shetal I-251 Sickert, Salomon I-567, I-578 Singh, Gagandeep I-211 Solar-Lezama, Armando II-219 Song, Fu II-157, II-487 Soria Dustmann, Oscar II-447 Sousa, Marcelo II-354 Srba, Jiří I-527 Stenger, Marvin I-289 Subramanyan, Pramod II-136 Summers, Alexander J. II-55

Tang, Qiyi I-681 Tasiran, Serdar II-336, II-430, II-467 Tautschnig, Michael II-467 Tentrup, Leander I-289, II-256 Tinelli, Cesare II-236 Toman, Viktor II-178 Tomb, Aaron II-413, II-430 Torfah, Hazem I-144 Trtik, Marek I-183 Tullsen, Mark II-413 Tuttle, Mark R. II-467

Unno, Hiroshi I-105 Urban, Caterina II-12, II-55

van Breugel, Franck I-681 van Dijk, Tom II-198 Vardi, Moshe Y. II-37, II-99 Vasicek, Zdenek I-612 Vechev, Martin I-211 Viswanathan, Mahesh I-347, II-117 Vizel, Yakir II-136 Vojnar, Tomáš I-612

Wagner, Lucas II-20 Walther, Christoph II-505 Wang, Chao II-157 Wang, Guozhen II-487 Wang, Xinyu I-407 Wehrle, Klaus II-447 Weininger, Maximilian I-623 Westbrook, Eddy II-430 Whalen, Mike II-20 Wolf, Clifford I-587 Wu, Zhilin II-487 Xia, Bican I-507 Yahav, Eran I-27 Yan, Jun II-487 Yang, Junfeng II-317 Yang, Weikun II-136 Yuan, Xinhao II-317 Zaffanella, Enea I-230 Zhan, Naijun I-507 Zhang, Jun II-157 Zhang, Yueling I-124 Zheng, Yunhui II-45 Zhou, Weichao I-662 Ziegler, Christopher I-567